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1 

AN ENZYME CAPABLE OF DEGRADING CELLULOSE OR HEMI CELLULOSE 

FIELD OF INVENTION 

5 The present invention relates to a cellulose- or hemicellulose- 
degrading enzyme, a DNA construct coding for the enzyme, a 
method of producing the enzyme, and an agent for degrading 
cellulose or hemicellulose comprising the enzyme. 

10 BACKGROUND OF THE INVENTION 

Enzymes which are able to degrade cellulose have previously 
been suggested for the conversion of biomass into liquid fuel, 
gas and feed protein. However, the production of fermentable 
15 sugars from biomass by means of cellulolytic enzymes is not yet 
able to compete economically with, for instance, the production 
of glucose from starch by means of a-amylase due to the 
inefficiency of the currently used cellulolytic enzymes. 
Cellulolytic enzymes may furthemore be used in the brewing 

2 0 industry for the degradation of £-glucans, in the baking 

industry for improving the properties of flour, in paper pulp 
processing for removing the non-crystalline parts of cellulose, 
thus increasing the proportion of crystalline cellulose in the 
pulp, and in animal feed for improving the digestibility of 
25 glucans. A further important use of cellulolytic enzymes is for 
textile treatment, e.g. for reducing the harshness of cotton- 
containing fabrics (cf., for instance, GB 1 368 599 or US 
4,435,307), for soil removal and colour clarification of 
fabrics (cf., for instance, EP 220 016) or for providing a 

3 0 localized variation in colour to give the fabrics a "stone- 

washed" appearance (cf., for instance, EP 307 564). 

The practical exploitation of cellulolytic enzymes has, to some 
extent, been set back by the nature of the known cellulase 
35 preparations which are often complex mixtures of a variety of 
single cellulase components, and which may have a rather low 
specific activity. It is difficult to optimise the production 
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of single components in multiple enzyme systems and thus to 
implement industrial cost-effective production of cellulolytic 
enzymes, and their actual use has been hampered by difficulties 
arising from the need to employ rather large quantities of the 
5 enzymes to achieve the desired effect. 

The drawbacks of previously suggested cellulolytic enzymes may 
be remedied by using single-component enzymes selected for a 
high specific activity. 

10 

Single-component cellulolytic enzymes have been isolated from, 
e.g. Trichoderma reesei ( C f. Teeri et al., Gene 51 r 1987, pp. 
43-52; P.M. Abuja, Biochem, Biophvs . Res. Comm. I5fi f 1988, pp. 
180-185; and P.J. Kraulis, Biochemistry 2? , 1989, pp. 7241- 

15 7257) . The T_±_ reesei cellulases have been found to be composed 
of a terminal A region responsible for binding to cellulose, a 
B region linking the A region to the core of the enzyme, and a 
core containing the catalytically active domain. The A region 
of different T^ reesei cellulases has been found to be highly 

20 conserved, and a strong homology has also been observed with a 
cellulase produced by Phanerochaete chrvsosporium (Sims et al. , 
Gene 24, 1988, pp. 411-422). 

SUMMARY OF THE INVENTION 

25 

It has surprisingly been found that other fungi, which are not 
closely related to either Trichoderma reesei or Phanerochaete 
chrysosporium, are capable of producing enzymes which contain 
a region which is homologous to the A region of T^ reesei 
3 0 cellulases. 

Accordingly, the present invention relates to a cellulose- or 
hemicellulose-degrading enzyme which is derivable from a fungus 
other than Trichoderma or Phanerochaete , and which comprises a 
35 carbohydrate binding domain homologous to a terminal A region 
of Trichoderma reesei cellulases, which carbohydrate binding 
domain comprises the following amino acid sequence 
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Xaa Xaa Gin Cys Gly Gly Xaa Xaa Xaa Xaa Gly Xaa Xaa Xaa Cys Xaa 

20 30 
5 Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Asn Xaa Xaa Tyr Xaa Gin Cys Xaa 

Xaa 

10 or a subsequence thereof capable of effecting binding of the 
enzyme to an insoluble cellulosic or hemicellulosic substrate. 
"Xaa" is intended to indicate variations in the amino acid 
sequence of the carbohydrate binding domain of different 
enzymes. A hyphen is intended to indicate a "gap" in the amino 

15 acid sequence (compared to other, similar enzymes) . 



In the present context, the term "cellulose" is intended to 
include soluble and insoluble, amorphous and crystalline forms 
of cellulose. The term "hemicellulose" is intended to include 

20 glucans (apart from starch) , mannans, xylans, arabinans or 
polyglucuronic or polygalacturonic acid. The term "carbohydrate 
binding domain" ("CBD") is intended to indicate an amino acid 
sequence capable of effecting binding of the enzyme to a 
carbohydrate substrate , in particular cellulose or 

25 hemicellulose as defined above. The term "homologous" is 
intended to indicate a high degree of identity in the sequence 
of amino acids constituting the carbohydrate binding domain of 
the present enzyme and the amino acids constituting the A 
region found in TV-_ reesei cellulases ("A region" is the term 

3 0 used to denote the cellulose (i.e. carbohydrate) binding domain 

of reesei cellulases) . 

It is currently believed that cellulose- or hemicellulose- 
degrading enzymes which contain a sequence of amino acids which 
35 is identifiable as a carbohydrate binding domain (or "A region" 
based on its homology to the A region of 3^. reesei cellulases 
possess certain desirable characteristics as a result of the 
function of the carbohydrate binding domain in the enzyme 
molecule which is to mediate binding to solid substrates 

4 0 (including cellulose) and consequently to enhance the activity 
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of such enzymes towards such substrates. The identification and 
preparation of carbohydrate binding domain-containing enzymes 
from a variety of microorganisms is therefore of considerable 
interest. 

5 

Cellulose- or hemicellulose-degrading enzymes of the invention 
may conveniently be identified by screening genomic or cDNA 
libraries of different fungi with a probe comprising at least 
part of the DNA encoding the A region of X*. reesei cellulases. 

10 Due to the intraspecies (i.e. different L_ reesei cellulases) 
and interspecies homology observed for the carbohydrate binding 
domains of different cellulose- or hemicellulose-degrading 
enzymes, there is reason to believe that this screening method 
constitutes a convenient way of isolating enzymes of current 

15 interest. 

DETAILED DISCIiOSURE OF THE INVENTION 

Carbohydrate binding domain (CBD) containing enzymes of the 
2 0 invention may, in particular, be derivable from strains of 
Pumiyola , e.g. Humjcola insolens r Fusarium , e.g. Fusarium 
oxysporuin , or Mvceliopthora r e.g. Mvceliopthora thermophile . 

Some of the variations in the amino acid sequence shown above 
25 appear to be "conservative", i.e. certain amino acids are 
preferred in these positions among the various CBD-containing 
enzymes of the invention. Thus, in position 1 of the sequence 
shown above, the amino acid is preferentially Trp or Tyr. In 
position 2, the amino acid is preferentially Gly or Ala. In 
30 position 7, the amino acid is preferentially Gin, lie or Asn. 
In position 8, the amino acid is preferentially Gly or Asn. In 
position 9, the amino acid is preferentially Trp, Phe or Tyr. 
In position 10, the amino acid is preferentially Ser, Asn, Thr 
or Gin. In position 12, the amino acid is preferentially Pro, 
35 Ala or Cys. In position 13, the amino acid is preferentially 
Thr, Arg or Lys. in position 14, the amino acid is 
preferentially Thr, Cys or Asn. In position 18, the amino acid 
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is preferentially Gly or Pro. In position 19, the amino acid 
(if present) is preferentially Ser, Thr, Phe, Leu or Ala. In 
position 20 , the amino acid is preferentially Thr or Lys. In 
position 24 , the amino acid is preferentially Gin or lie. In 
5 position 26, the amino acid is preferentially Gin, Asp or Ala. 
In position 27, the amino acid is preferentially Trp, Phe or 
Tyr. In position 29, the amino acid is preferentially Ser, His 
or Tyr, In position 32, the amino acid is preferentially Leu, 
lie, Gin, Val or Thr. 

10 

Examples of specific CBD-containing enzymes of the invention 
are those which comprise one of the following amino acid 
seguences 

15 Trp Gly Gin Cys Gly Gly Gin Gly Trp Asn Gly Pro Thr Cys Cys Glu 
Ala Gly Thr Thr Cys Arg Gin Gin Asn Gin Trp Tyr Ser Gin Cys 
Leu; 

Trp Gly Gin Cys Gly Gly lie Gly Trp Asn Gly Pro Thr Thr Cys Val 
20 Ser Gly Ala Thr Cys Thr Lys lie Asn Asp Trp Tyr His Gin Cys 
Leu; 

Trp Gly Gin Cys Gly Gly lie Gly Phe Asn Gly Pro Thr Cys Cys Gin 
Ser Gly Ser Thr Cys Val Lys Gin Asn Asp Trp Tyr Ser Gin Cys 
2 5 Leu ; 

Trp Gly Gin Cys Gly Gly Asn Gly Tyr Ser Gly Pro Thr Thr Cys Ala 
Glu Gly - Thr Cys Lys Lys Gin Asn Asp Trp Tyr Ser Gin Cys Thr 
Pro; 

30 

Trp Gly Gin Cys Gly Gly Gin Gly Trp Gin Gly Pro Thr Cys Cys Ser 
Gin Gly - Thr Cys Arg Ala Gin Asn Gin Trp Tyr Ser Gin Cys Leu 
Asn; 

35 Trp Gly Gin Cys Gly Gly Gin Gly Tyr Ser Gly Cys Thr Asn Cys Glu 
Ala Gly Ser Thr Cys Arg Gin Gin Asn Ala Tyr Tyr Ser Gin Cys 
lie; 
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Trp Gly Gin Cys Gly Gly Gin Gly Tyr Ser Gly Cys Arg Asn Cys Glu 
Ser Gly Ser Thr Cys Arg Ala Gin Asn Asp Trp Tyr Ser Gin Cys 
Leu; 

Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys Thr Thr Cys Val 
5 Ala Gly Ser Thx Cys Thr Lys lie Asn Asp Trp Tyr His Gin Cys 
Leu; 

Trp Gly Gin Cys Gly Gly Gin Asn Tyr Ser Gly Pro Thr Thr Cys Lys 
Ser Pro Phe Thr Cys Lys Lys lie Asn Asp Phe Tyr Ser Gin Cys 
10 Gin; or 

Trp Gly Gin Cys Gly Gly Asn Gly Trp Thr Gly Ala Thr Thr Cys Ala 
Ser Gly Leu Lys Cys Glu Lys lie Asn Asp Trp Tyr Tyr Gin Cys Val 

15 The cellulose- or hemicellulose-degrading enzyme of the 
invention may further comprise an amino acid sequence which 
defines a linking B region (to use the nomenclature established 
for T^. reesei cellulases) adjoining the carbohydrate binding 
domain and connecting it to the catalytically active domain of 

20 the enzyme. The B region sequences established so far for 
enzymes of the invention indicate that such sequences are 
characterized by being predominantly hydrophilic and uncharged, 
and by being enriched in certain amino acids, in particular 
glycine and/or asparagine and/or proline and/or serine and/or 

25 threonine and/or glutamine. This characteristic structure of 
the B region imparts flexibility to the sequence, in particular 
in sequences containing short, repetitive units of primarily 
glycine and asparagine. Such repeats are not found in the B 
region sequences of Tj_ reesei or chrysosporium which contain 

3 0 B regions of the serine/threonine type. The flexible structure 
is believed to facilitate the action of the catalytically 
active domain of the enzyme bound by the A region to the 
insoluble substrate, and therefore imparts advantageous 
properties to the enzyme of the invention. 

35 

Specific examples of B regions contained in enzymes of the 
invention have the following amino acid sequences 
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Ala Arg Thr Asn Val Gly Gly Gly 
Gly Gly Asn Asn Gly Gly Asn Pro 
Gly Gly Asn Pro Gly Gly Asn Pro 
Ser Pro Leu; 



Ser Thr Gly Gly Gly Asn Asn Gly 
Gly Gly Asn Pro Gly Gly Asn Pro 
Gly Gly Asn Pro Gly Gly Asn Cys 



Pro Gly Gly Asn Asn Asn Asn Pro Pro Pro Ala Thr Thr Ser Gin Trp 
Thr Pro Pro Pro Ala Gin Thr Ser Ser Asn Pro Pro Pro Thr Gly Gly 
Gly Gly Gly Asn Thr Leu His Glu Lys; 

10 

Gly Gly Ser Asn Asn Gly Gly Gly Asn Asn Asn Gly Gly Gly Asn Asn 
Asn Gly Gly Gly Gly Asn Asn Asn Gly Gly Gly Asn Asn Asn Gly Gly 
Gly Asn Thr Gly Gly Gly Ser Ala Pro Leu; 

15 Val Phe Thr Cys Ser Gly Asn Ser Gly Gly Gly Ser Asn Pro Ser Asn 
Pro Asn Pro Pro Thr Pro Thr Thr Phe lie Thr Gin Val Pro Asn Pro 
Thr Pro Val Ser Pro Pro Thr Cys Thr Val Ala Lys; 

Pro Ala Leu Trp Pro Asn Asn Asn Pro Gin Gin Gly Asn Pro Asn Gin 
20 Gly Gly Asn Asn Gly Gly Gly Asn Gin Gly Gly Gly Asn Gly Gly Cys 
Thr Val Pro Lys; 



Pro Gly Ser Gin Val Thr Thr Ser 
Ser Arg Ala Thr Ser Thr Thr Ser 
25 Thr Ser Pro Thr Arg Thr Val Thr 
Ala Ser Tyr Asn; 



Thr Thr Ser Ser Ser Ser Thr Thr 
Ala Gly Gly Val Thr Ser lie Thr 
lie Pro Gly Gly Ala Ser Thr Thr 



Glu Ser Gly Gly Gly Asn Thr Asn 
Asn Pro Thr Asn Pro Thr Asn Pro 
30 Pro Gly Asn Pro Gly Gly Gly Asn 
Pro Leu; or 



Pro Thr Asn Pro Thr Asn Pro Thr 
Trp Asn Pro Gly Asn Pro Thr Asn 
Gly Gly Asn Gly Gly Asn Cys Ser 



Pro Ala Val Gin lie Pro Ser Ser Ser Thr Ser Ser Pro Val Asn Gin 
Pro Thr Ser Thr Ser Thr Thr Ser Thr Ser Thr Thr Ser Ser Pro Pro 
35 Val Gin Pro Thr Thr Pro Ser Gly Cys Thr Ala Glu Arg 
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In another aspect, the present invention relates to a 
carbohydrate binding domain homologous to a terminal A region 
of Trichoderma reesei cellulases, which carbohydrate binding 
domain comprises the following amino acid sequence 

5 

1 10 

Xaa Xaa Gin Cys Gly Gly Xaa Xaa Xaa Xaa Gly Xaa Xaa Xaa Cys Xaa 

20 30 
10 Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Asn Xaa Xaa Tyr Xaa Gin Cys Xaa 

Xaa 

15 or a subsequence thereof capable of effecting binding of a 
protein to an insoluble cellulosic or hemicellulosic substrate. 

Examples of specific carbohydrate binding domains are those 
with the amino acid sequence indicated above. 

20 

In a further aspect, the present invention relates to a linking 
B region derived from a cellulose- or hemicellulose-degrading 
enzyme, said region comprising an amino acid sequence enriched 
in the amino acids glycine and/ or asparagine and/or proline 
25 and/or serine and/ or threonine and/ or glutamine. As indicated 
above, these amino acids may often occur in short, repetitive 
units. Examples of specific B region sequences are those shown 
above . 

3 0 The present invention provides a unique oppportunity to 

"shuffle" the various regions of different cellulose- or 
hemicellulose-degrading enzymes, thereby creating novel 
combinations of the CBD, B region and catalytically active 
domain resulting in novel activity profiles of this type of 
35 enzymes. Thus, the enzyme of the invention may be one which 
comprises an amino acid sequence defining a CBD, which amino 
acid sequence is derived from one naturally occurring 
cellulose- or hemicellulose-degrading enzyme, an amino acid 
sequence defining a linking B region, which amino acid sequence 

4 0 is derived from another naturally occurring cellulose- or 
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hemicellulose-degrading enzyme, as well as a catalytically 
active domain derived from the enzyme supplying either the CBD 
or the B region or from a third enzyme. In a particular 
embodiment, the catalytically active domain is derived from an 
5 enzyme which does not, in nature, comprise any CBD or B region. 
In this way, it is possible to construct enzymes with improved 
binding properties from enzymes which lack the CBD and B 
regions. 

10 The enzyme of the invention is preferably a cellulase such as 
an endoglucanase (capable of hydrolysing amorphous regions of 
low crystallinity in cellulose fibres) , a cellobiohydrolase 
(also known as an exoglucanase, capable of initiating 
degradation of cellulose from the non-reducing chain ends by 

15 removing cellobiose units) or a 0-glucosidase. 

In a still further aspect, the present invention relates to a 
DNA construct which comprises a DNA sequence encoding a 
cellulose- or hemicellulose-degrading enzyme as described 
20 above. 

A DNA sequence encoding the present enzyme may, for instance, 
be isolated by establishing a cDNA or genomic library of a 
microorganism known to produce cellulose— or hemicellulose- 

25 degrading enzymes, such as a strain of Humicola . Fusarium or 
Mvcelopthora , and screening for positive clones by conventional 
procedures such as by hybridization to oligonucleotide probes 
synthesized on the basis of the full or partial amino acid 
sequence of the enzyme or probes based on the partial or full 

30 DNA sequence of the A region from reesei cellulases, as 
indicated above, or by selecting for clones expressing the 
appropriate enzyme activity, or by selecting for clones 
producing a protein which is reactive with an antibody raised 
against a native cellulose- or hemicellulose-degrading enzyme. 

35 

Alternatively, the DNA sequence encoding the enzyme may be 
prepared synthetically by established standard methods, e.g. 
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the phosphoamidite method described by S.L. Beaucage and M.H. 
Caruthers, Tetrahedron Letters 22 . 1981, pp. 1859-1869, or the 
method described by Matthes et al., The EMBO J, 3, 1984, pp. 
801-805, According to the phosphoamidite method, 
5 oligonucleotides are synthesized, e.g. in an automatic DNA 
synthesizer, purified, annealed, ligated and cloned in 
appropriate vectors. 

Finally, the DNA sequence may be of mixed genomic and 
10 synthetic, mixed synthetic and cDNA or mixed genomic and cDNA 
origin prepared by ligating fragments of synthetic, genomic or 
cDNA origin (as appropriate) , the fragments corresponding to 
various parts of the entire DNA construct, in accordance with 
standard techniques. Thus, it may be envisaged that a DNA 
15 sequence encoding the CBD of the enzyme may be of genomic 
origin, while the DNA sequence encoding the B region of the 
enzyme may be of synthetic origin , or vice versa ; the DNA 
sequence encoding the catalytically active domain of the enzyme 
may conveniently be of genomic or cDNA origin. The DNA 
20 construct may also be prepared by polymerase chain reaction 
using specific primers, for instance as described in US 
4,683,202 or R.K. Saiki et al., Science 239 . 1988, pp. 487-491. 

The present invention also relates to an expression vector 
25 which carries an inserted DNA construct as described above. The 
expression vector may suitably comprise appropriate promoter, 
operator and terminator sequences permitting the enzyme to be 
expressed in a particular host organism, as well as an origin 
of replication enabling the vector to replicate in the host 
30 organism in question. 

The resulting expression vector may then be transformed into a 
suitable host cell, such as a fungal cell, a preferred example 
of which is a species of Aspergillus . most preferably 
3 5 Aspergillus orvzae or Aspergillus niger . Fungal cells may be 
transformed by a process involving protoplast formation and 
transformation of the protoplasts followed by regeneration of 
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the cell wall in a manner known per se . The use of Aspergillus 
as a host microorganism is described in EP 238,023 (of Novo 
Industri A/S) , the contents of which are hereby incorporated by 
reference . 

5 

Alternatively, the host organisms may be a bacterium, in 
particular strains of Streptomyces and Bacillus , and E . coli . 
The transformation of bacterial cells may be performed 
according to conventional methods, e.g. as described in 
10 Sambrook et al. f Molecular Cloning; A Laboratory Manual . Cold 
Spring Harbor, 1989. 

The screening of appropriate DNA sequences and construction of 
vectors may also be carried out by standard procedures , cf . 
15 Sambrook et al. r op. cit. 

The invention further relates to a method of producing a 
cellulose- or hemicellulose-degrading enzyme as described 
above, wherein a cell transformed with the expression vector of 

20 the invention is cultured under conditions conducive to the 
production of the enzyme, and the enzyme is subsequently 
recovered from the culture. The medium used to culture the 
transformed host cells may be any conventional medium suitable 
for growing the host cells in question. The expressed enzyme 

25 may conveniently be secreted into the culture medium and may be 
recovered therefrom by well-known procedures including 
separating the cells from the medium by centrifugation or 
filtration, precipitating proteinaceous components of the 
medium by means of a salt such as ammonium sulphate, followed 

30 by chromatographic procedures such as ion exchange 
chromatography, affinity chromatography, or the like. 

By employing recombinant DNA techniques as indicated above, 
techniques of fermentation and mutation or other techniques 
3 5 which are well known in the art, it is possible to provide 
cellulose- or hemicellulose-degrading enzymes of a high purity 
and in a high yield. 
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The present invention further relates to an agent for degrading 
cellulose or hemicellulose, the agent comprising a cellulose- 
or hemicellulose-degrading enzyme as described above. It is 
contemplated that, dependent on the specificity of the enzyme, 
5 it may be employed for one (or possibly more) of the 
applications mentioned above. In a particular embodiment, the 
agent may comprise a combination of two or more enzymes of the 
invention or a combination of one or more enzymes of the 
invention with one or more other enzymes with cellulose- or 
10 hemicellulose-degrading activity. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 shows the construction of plasmid p SX224; 

Fig. 2 shows the construction of plasmid pHW485; 
15 Fig. 3 shows the construction of plasmid pHW697 and pHW704; 

Fig. 4 shows the construction of plasmid pHw768; 

Fig. 5 is a restriction map of plasmid pSX320; 

Fig. 6 shows the construction of plasmid pSX777 

Fig. 7 shows the construction of plasmid pCaHjl70; 
20 Fig. 8 shows the construction of plasmid IM4 ; 

Fig. 9 shows the SOE fusion of the ~43kD endoglucanase signal 

peptide and the N-terminal of Endol : 

Fig. 10 shows the construction of plasmid pCaHjlSO? 

Fig. 11 shows the DNA sequence and derived amino acid sequence 
25 of F . oxvsporum C-family cellobiohydrolase; 

Fig. 12 shows the DNA sequence and derived amino acid sequence 

of F . oxvsporum F-family cellulase; 

Fig. 13 shows the DNA sequence and derived amino acid sequence 
of F . oYy.gpnrn^ C-family endoglucanase ; 

30 Fig. 14.A-E whows the DNA sequence and derived amino acid 
sequence of H.insolens endoglucanase 1(EG1); and 
Fig. 15A-D shows the DNA sequence and derived amino acid 
sequence of a fusion of the B.lautus (NCIMB 40250) Endo 1 
catalytic domain and the CBD and B region of H. insolens ~43kD 

3 5 endoglucanase . 
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The invention is further illustrated in the following examples 
which are not in any way intended to limit the scope of the 
invention as claimed. 



5 Example 1 



Isolation of A region-containing clones from H- insolens 

From H. insolens strain DSM 1800 (described in, e.g. wo 
10 89/09259) grown on cellulose, mRNA was prepared according to 
the method described by Koplan et al., Biochenu J. 183 (1979) 
181-184. A cDNA library containing 20,000 clones was obtained 
substantially by the method of Okayama and Berg, Methods in 
EnzymologY 154 . 1987, pp. 3-28. 

15 

The cDNA library was screened as described by Gergen et al. , 
Nucl. Acids Res. 7(8), 1979, pp. 2115-2136, with 
oligonucleotide probes in the antisense configuration, designed 
according to the published sequences of the N-terminal part of 
20 the A-region of the four T_=. reesei cellulase genes (Penttila et 
al., Gene 45. (1986), 253-63; Saloheimo et al., Gene 63, (1988), 
11-21; Shoemaker et al., Biotechnology, October 1983, 691-696; 
Teeri et al-, Gene 51 (1987) 43-52. The probe sequences were as 
follows: 

25 

NOR-8 04 5'-CTT GCA CCC GCT GTA CCC AAT GCC ACC GCA CTG CCC 
(- EG 1) CCA-3' 

NOR-8 05 5*-CGT GGG GCC GCT GTA GCC AAT ACC GCC GCA CTG GCC 
(-CBH 1) GTA- 3 * 

3 0 NOR-8 07 5'-AGT CGG ACC CGA CCA ATT CTG GCC ACC ACA TTG GCC 
(~CBH 2) CCA-3' 

NOR-8 0 8 5'-CGT AGG TCC GCT CCA ACC AAT ACC TCC ACA CTG GCC 
(-EG 3) CCA-3 r 

35 Screening yielded a large number of candidates hybridising well 
to the A-region probes. Restriction mapping reduced the number 
of interesting clones to 17, of which 8 have so far been 
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sequenced (as described by Haltiner et al., Nucl. Acids Res. 
13., 1985, pp. 1015-1025) sufficiently to confirm the presence 
of a terminal CBD as well as a B-region. 

5 The deduced amino acid sequences obtained for the CBDs were as 
follows 

A-l: Trp Gly Gin Cys Gly Gly Gin Gly Trp Asn Gly Pro Thr Cys 

Cys Glu Ala Gly Thr Thr Cys Arg Gin Gin Asn Gin Trp Tyr Ser Gin 
10 Cys Leu; 

A-5: Trp Gly Gin Cys Gly Gly lie Gly Trp Asn Gly Pro Thr Thr 

Cys Val Ser Gly Ala Thr Cys Thr Lys lie Asn Asp Trp Tyr His Gin 
Cys Lieu ; 

15 

CBH-2: Trp Gly Gin Cys Gly Gly lie Gly Phe Asn Gly Pro Thr Cys 
Cys Gin Ser Gly Ser Thr Cys Val Lys Gin Asn Asp Trp Tyr Ser Gin 
Cys Leu; 

20 A-8: Trp Gly Gin Cys Gly Gly Asn Gly Tyr Ser Gly Pro Thr Thr 

Cys Ala Glu Gly - Thr Cys Lys Lys Gin Asn Asp Trp Tyr Ser Gin 
Cys Thr Pro; 

. A-9: Trp Gly Gin Cys Gly Gly Gin Gly Trp Gin Gly Pro Thr Cys 

25 Cys Ser Gin Gly - Thr Cys Arg Ala Gin Asn Gin Trp Tyr Ser Gin 
Cys Leu Asn? 

A- 11: Trp Gly Gin Cys Gly Gly Gin Gly Tyr Ser Gly Cys Thr Asn 
Cys Glu Ala Gly Ser Thr Cys Arg Gin Gin Asn Ala Tyr Tyr Ser Gin 
30 Cys lie; 



A-19: Trp Gly Gin Cys Gly Gly Gin Gly Tyr Ser Gly Cys Arg Asn 
Cys Glu Ser Gly Ser Thr Cys Arg Ala Gin Asn Asp Trp Tyr Ser Gin 
3 5 Cys Leu ; and 
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"43 kD: Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys Thr Thr 
Cys Val Ala Gly Ser Thr Cys Thr Lys lie Asn Asp Trp Tyr His Gin 
Cys Leu 

5 The deduced amino acid sequences obtained for the B region were 
as follows 

Al: Ala Arg Thr Asn Val Gly Gly Gly Ser Thr Gly Gly Gly Asn 
Asn Gly Gly Gly Asn Asn Gly Gly Asn Pro Gly Gly Asn Pro Gly Gly 
10 Asn Pro Gly Gly Asn Pro Gly Gly Asn Pro Gly Gly Asn Pro Gly Gly 
Asn Cys Ser Pro Leu; 

A5: Pro Gly Gly Asn Asn Asn Asn Pro Pro Pro Ala Thr Thr Ser 
Gin Trp Thr Pro Pro Pro Ala Gin Thr Ser Ser Asn Pro Pro Pro Thr 
15 Gly Gly Gly Gly Gly Asn Thr Leu His Glu Lys; 

A8: Gly Gly Ser Asn Asn Gly Gly Gly Asn Asn Asn Gly Gly Gly 
Asn Asn Asn Gly Gly Gly Gly Asn Asn Asn Gly Gly Gly Asn Asn Asn 
Gly Gly Gly Asn Thr Gly Gly Gly Ser Ala Pro Leu; 

20 

All: Val Phe Thr Cys Ser Gly Asn Ser Gly Gly Gly Ser Asn Pro 
Ser Asn Pro Asn Pro Pro Thr Pro Thr Thr Phe lie Thr Gin Val Pro 
Asn Pro Thr Pro Val Ser Pro Pro Thr Cys Thr Val Ala Lys ; 

25 A19: Pro Ala Leu Trp Pro Asn Asn Asn Pro Gin Gin Gly Asn Pro 
Asn Gin Gly Gly Asn Asn Gly Gly Gly Asn Gin Gly Gly Gly Asn Gly 
Gly Cys Thr Val Pro Lys; 

CBH2: Pro Gly Ser Gin Val Thr Thr Ser Thr Thr Ser Ser Ser Ser 
3 0 Thr Thr Ser Arg Ala Thr Ser Thr Thr Ser Ala Gly Gly Val Thr Ser 
lie Thr Thr Ser Pro Thr Arg Thr Val Thr lie Pro Gly Gly Ala Ser 
Thr Thr Ala Ser Tyr Asn; 

A9: Glu Ser Gly Gly Gly Asn Thr Asn Pro Thr Asn Pro Thr Asn 
35 Pro Thr Asn Pro Thr Asn Pro Thr Asn Pro Trp Asn Pro Gly Asn Pro 
Thr Asn Pro Gly Asn Pro Gly Gly Gly Asn Gly Gly Asn Gly Gly Asn 
Cys Ser Pro Leu; or 
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Pro Ala Val Gin lie Pro Ser Ser Ser Thr Ser Ser Pro Val Asn Gin 
Pro Thr Ser Thr Ser Thr Thr Ser Thr Ser Thr Thr Ser Ser Pro Pro 
Val Gin Pro Thr Thr Pro Ser Gly Cys Thr Ala Glu Arg 

5 

Example 2 

Expression in A. o ryzae of a CBH 2 -type cellulase from H. 
insolens 

10 

The complete sequence of one of the CBD clones shows a striking 
similarity to a cellobiohydrolase (CBH 2) from T. reesei . 

The construction of the expression vector pSX224 carrying the 

15 JL. insolens CBH 2 gene for expression in and secretion from A. 
oryzae is outlined in Fig. 1. The vector p777 containing the 
pUC 19 replicon and the regulatory regions of the TAKA amylase 
promoter from A, oryzae and glucoamylase terminator from A. 
niger is described in EP 238 023. pSX 217 is composed of the 

20 cloning vector pcDVl-pLl (cf. Okayama and Berg, op. cit. ) 
carrying the insolens CBH 2 gene on a 1.8 kb fragment. The 
CBH 2 gene contains three restriction sites used in the 
construction: A Ball site at the initiating methionine codon in 
the signal sequence, a BstBI site 62 0 bp downstream from the 

25 Ball site and an Avail site 860 bp downstream from the BstBI 
site. The Avail site is located in the non-translated C- 
terminal part of the gene upstream of the poly A region, which 
is not wanted in the final construction. Nor is the poly G 
region upstream of the gene in the cloning vector. This region 

3 0 is excised and replaced by an oligonucleotide linker which 
places the translational start codon close to the BamHI site at 
the end of the TAKA promoter. 

The expression vector pSX 224 was transformed into A. orvzae 
35 IFO 4177 using the amdS gene from A. nidulans as the selective 
marker as described in EP 238 023. Transf ormants were grown in 
YPD medium (Sherman et al. f Methods in Yeast Genetics, Cold 
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Spring Harbor Laboratory, 1981) for 3-4 days and analysed for 
new protein species in the supernatant by sodium dodecyl 
sulphate polyacrylamide gel electrophoresis. The CBH 2 from EL 
insolens formed a band with an apparent Mw of 65 IcD indicating 
5 a substantial glycosylation of the protein chain, which is 
calculated to have a Mw of 51 kD on the basis of the amino acid 
composition. The intact enzyme binds well to cellulose, while 
enzymatic degradation products of 55 kD and 4 0 kD do not bind, 
indicating removal of the A-region and possibly the B-region. 
10 The enzyme has some activity towards filter paper, giving rise 
to release of glucose. As expected, it has very limited 
endoglucanase activity as measured on soluble cellulose in the 
form of carboxy methyl cellulose. 

15 Example 3 

Isolation of Fusarium oxvsponun genomic DNA 

A freeze-dried culture of Fusarium oxvsporum was reconstituted 
20 with phosphate buffer, spotted 5 times on each of 5 FOX medium 
plates (6% yeast extract, 1.5% K 2 HP0 4 , 0.75% MgS0 4 7H 2 o, 22.5% 
glucose, 1.5% agar, pH 5.6) and incubated at 37 *C- After 6 days 
of incubation the colonies were scraped from the plates into 15 
ml of 0.001% Tween-80 which resulted in a thick and cloudy 
25 suspension. 

Four 1-liter flasks, each containing 300 ml of liquid FOX 
medium, were inoculated with 2 ml of the spore suspension and 
were incubated at 30 *C and 240 rpm. On the 4th day of 

30 incubation, the cultures were filtered through 4 layers of 
sterile gauze and washed with sterile water. The mycelia were 
dried on Whatman filter paper, frozen in liquid nitrogen, 
ground into a fine powder in a cold mortar and added to 75 ml 
of fresh lysis buffer (10 mM Tris-Cl 7.4, 1% SDS, 50 mM EDTA, 

3 5 100 ill DEPC) . The thoroughly mixed suspension was incubated in 
a 65 8 C waterbath for 1 hour and then spun for 10 minutes at 
4000 rpm and 5'C in a bench-top centrifuge. The supernatant 
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was decanted and EtOH precipitated. After 1 hour on ice the 
solution was spun at 19,000 rpm for 20 minutes. The supernatant 
was decanted and isopropanol precipitated. Following 
centrifugation at 10,000 rpm for 10 minutes, the supernatant 
5 was decanted and the pellets allowed to dry. 

One milliliter of TER solution (10 mM Tris-HCl, pH 7.4, 1 mM 
EDTA, 100 |ig RNAse A) was added to each tube, and the tubes 
were stored at 4*C for two days. The tubes were pooled and 

10 placed in a 65 *C waterbath for 3 0 minutes to suspend non- 
dissolved DNA. The solution was extracted twice with 
phenol/CHCl 3 /isoamyl alcohol, twice with CHCl 3 /isoamyl alcohol 
and then ethanol precipitated. The pellet was allowed to settle 
and the EtOH was removed. 70% EtOH was added and the DNA stored 

15 overnight at -20* C. After decanting and drying, 1 ml of TER 
was added and the DNA was dissolved by incubating the tubes at 
65 'C for 1 hour. The preparation yielded 1.5 mg of genomic DNA. 

Amplification, cloning and sequencing of DNA amplified with 
20 degenerate primers 

To amplify DNA from C-family (according to the nomenclature of 
Henrissat et al. Gene 81 (1), 1989, pp. 83-96) cellulases using 
PGR (cf. US 4,683,195 and US 4,683,202) each "sense" 
25 oligonucleotide was used in combination with each "antisense" 
oligonucleotide. Thus, the following primer pair was used: 



Primer 1 



Primer 2 



ZC3220 



ZC3221 



30 



ZC3220 : 
GA(C/T) 



GCC AAC TAC GGT ACC GG(A/C/G/T) TA(C/T) TG(C/T) 
(A/G/T) (C/G) (A/G/C/T) CA(G/A) TG 



ZC3221: 



GCG TTG GCC TCT AGA AT (G/A) TCC AT(C/T) TC (A/G/C/T) 
(A/T) (G/A) CA(G/A) CA 



35 (C/G/T) 
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In the PCR reaction, 1 fig of Fusarium oxvsporum genomic DNA was 
used as the template. Ten times PCR buffer is lOOmM Tris-HCl pH 
8.3, 500 mM KC1, 15 mM MgCl, 0.1% gelatin (Perkin-Elmer Cetus) . 
The reactions contained the following ingredients: 

5 



dH20 


35.75 


Ml 




10X PCR buffer 


5 


Ml 




template DNA 


5 


Ml 




primer 1 


2 


Ml 


(40pmol) 


primer 2 


2 


Ml 


(40 pmol) 


Taa nolvmerase 


0.25 


Ml 


(1.25 U) 


total 




50 


Ml 



The PCR reactions were performed for 4 0 cycles under the 
15 following conditions: 

94 'C 1.5 min 

45 0 2.0 min 

72* 2.0 min 



20 Five microliters of each reaction was analyzed by agarose gel 
electrophoresis. The sizes of the DNA fragments were estimated 
from DNA molecular weight markers. The reacton primed with 
ZC3220 and ZC3221, produced two DNA fragments of appropriate 
size to be candidates for fragments of C-family cellulases. The 

25 agarose sections containing these two fragments were excised, 
and the DNA was electroeluted and digested with the restriction 
enzymes Kpnl and Zbal. The fragments were ligated into the 
vector pUC18 which had been cut with the same two restriction 
enzymes. The ligations were transformed into E. coli and mini- 

30 prep DNA was prepared from the resulting colonies. The DNA 
sequences of these inserts were determined and revealed that 
two new C-family cellulases had been identified, one a new 
cellobiohydrolase and the other a new endoglucanase. 
The PCR cloning strategy described above for the C-family 

3 5 cellulases was applied using other primers which encoded 
conserved cellulase sequences within the known F-family 



WO 91/17244 



PCT/DK91/0G124 



20 

cellulases (cf. Henrissat et al., op. cit. ) The following 
primer pair was used for amplification of Fusarium genomic DNA. 

Primer 1 Primer 2 

5 ZC3226 ZC3227 

££3226: TCC TGA CGC CAA GCT TT (A/G/T) (C/T) (A/T) (A/T) 

(A/C/T)AA (C/T)GA (C/T)TA (C/T)AA 

10 gC3227: CAC CGG CAC CAT CGA T(G/A/)T C(A/C/G/T)A 

(G/A) (C/T)T C(A/G/C/T)G T (A/G/T) A T 

The PCR reactions were performed for 4 0 cycles as follows: 

15 94*C 1.5 min 

50'C 2.0 min 

72 'C 2.0 min 

The 180 bp band was eluted from an agarose gel fragment, 
20 digested with the restriction enzymes Hind III and Cla I and 
ligated into pUC19 which had been digested with Hind III and 
Accl. The ligated DNA was transformed into E. coli and mini- 
prep DNA was prepared from colony isolates. The DNA sequence of 
the cloned DNA was determined. This fragment encoded sequences 
25 corresponding to a new member of the F-family cellulases. 

Construction of a Fusar^itn oyy^poT^Tn cDNA library 

Fusarjuiq oxvsporum was grown by fermentation and samples were 
30 withdrawn at various times for RNA extraction and cellulase 
activity analysis. The activity analysis included an assay for 
total cellulase activity as well as one for colour 
clarification. Fusarium oxvsporum samples demonstrating maximal 
colour clarification were extracted for total RNA from which 
35 poly(A)+RNA was isolated. 
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To construct a Fusarium oxvsporum cDNA library, first-strand 
cDNA was synthesized in two reactions, one with and the other 
without radiolabelled dATP. A 2.5X reaction mixture was 
prepared at room temperature by mixing the following reagents 
5 in the following order: 10 fil of 5X reverse transcriptase 
buffer (Gibco-BRL, Gaithersburg, Maryland) 2.5 /xl 200 mM 
dithiothreitol (made fresh or from a stock solution stored at - 
70 *C), and 2.5 fil of a mixture containing 10 mM of each 
deoxynucleotide triphosphate, (dATP, dGTP, dTTP and 5-methyl 

10 dCTP, obtained from Pharmacia LKB Biotechnology, Alameda, CA) . 
The reaction mixture was divided into each of two tubes of 7.5 
jxl. 1.3 ^il of 10 fiCi/fil 32 P a-dATP (Amersham, Arlington 
Heights, IL) was added to one tube and 1.3 fil of water to the 
other. Seven microliters of each mixture was transferred to 

15 final reaction tubes. In a separate tube, 5 ftg of Fusarium 
oxvsporum poly (A) + RNA in 14 /tl of 5 mM Tris-HCl pH 7 .4,. 50 /tM 
EDTA was mixed with 2 fil of 1 fig/pl first strand primer (ZC2938 
GACAGAGCACAGAATTCACTAGTGAGCTCT 15 ) . The RNA-primer mixture was 
heated at 65 °C for 4 minutes, chilled in ice water, and 

2 0 centrifuged briefly in a microfuge. Eight microliters of the 
RNA-primer mixture was added to the final reaction tubes. Five 
microliters of 200 U//lc1 Superscript™ reverse transcriptase 
(Gibco-BRL) was added to each tube. After gentle agitation, the 
tubes were incubated at 45 °C for 30 minutes. Eighty microliters 

25 of 10 mM Tris-HCl pH 7.4, 1 mM EDTA was added to each tube, the 
samples were vortexed, and briefly centrifuged. Three 
microliters was removed from each tube to determine counts 
incorporated by TCA precipitation and the total counts in the 
reaction. A 2 /il sample from each tube was analyzed by gel 

30 electrophoresis. The remainder of each sample was ethanol 
precipitated in the presence of oyster glycogen. The nucleic 
acids were pelleted by centrifugation, and the pellets were 
washed with 8 0% ethanol. Following the ethanol wash, the 
samples were air dried for 10 minutes. The first strand 

35 synthesis yielded 1.6 jug of Fusarium oxvsporum cDNA, a 33% 
conversion of poly(A)+RNA into DNA. 
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Second strand cDNA synthesis was performed on the RNA-DNA 
hybrid from the first strand reactions under conditions which 
encouraged first strand priming of second strand synthesis 
resulting in hairpin DNA. The first strand products from each 
5 of the two first strand reactions were resuspended in 71 /xl of 
water. The following reagents were added, at room temperature, 
to the reaction tubes: 20 /tl of 5X second strand buffer (100 mM 
Tris pH 7.4, 450 mM KC1, 23 mM MgCl 2 , and 50 mM (NH 4 ) 2 (S0 4 ), 3 
/x-1 of 5 mM 0-NAD, and /xl of a deoxynucleotide triphosphate 

10 mixture with each at 10 mM. One microliter of a- 32 P dATP was 
added to the reaction mixture which received unlabeled dATP for 
the first strand synthesis while the tube which received 
labeled dATP for first strand synthesis received 1 ttl of water. 
Each tube then received 0.6 /il of 7 U/jtl E. coli DNA ligase 

15 (Boehringer-Mannheim, Indianapolis, IN), 3.1 ^1 of 8 U//xl e. 
coli DNA polymerase I (Amersham) , and 1 j^l 2 U/aQ of RNase H 
(Gibco-BRL) . The reactions were incubated at 16 *C for 2 hours. 
After incubation, 2fil from each reaction was used to determine 
TCA precipitable counts and total counts in the reaction, and 

20 2 pi from each reaction was analyzed by gel electrophoresis. To 
the remainder of each sample, 2 fil of 2.5 fig/fil oyster 
glycogen, 5 pi of 0.5 EDTA and 200 /x,l of 10 mM Tris-HCl pH 7.4, 
1 mM EDTA were added. The samples were phenol-chloroform 
extracted and isopropanol precipitated. After centrifugation 

25 the pellets were washed with 100 /xl of 80% ethanol and air 
dried. The yield of double stranded cDNA in each of the 
reactions was approximately 2.5 fig. 

Mung bean nuclease treatment was used to clip the single- 
3 0 stranded DNA of the hair-pin. Each cDNA pellet was resuspended 
in 15 jtl of water and 2.5 fil of 10X mung bean buffer (0.3 M 
NaAc pH 4.6, 3 M NaCl , and 10 mM ZnS0 4 ) , 2.5 pi of 10 mM DTT, 
2.5 of 50% glycerol, and 2.5 ;il of 10 V/fil mung bean 

nuclease (New England Biolabs, Beverly, MA) were added to each 
3 5 tube. The reactions were incubated at 3 0 *C for 30 minutes and 
75 pi of 10 mM Tris-HCl pH 7,4 and 1 mM EDTA was added to each 
tube. Two-microliter aliquots were analyzed by alkaline agarose 
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gel analysis. One hundred microliters of 1 M Tris-HCl pH 7.4 
was added to each tube and the samples were phenol-chloroform 
extracted twice. The DNA was isopropanol precipitated and 
pelleted by centrifugation. After centrifugation, the DNA 
5 pellet was washed with 80% ethanol and air dried. The yield was 
approximately 2 /xg of DNA from each of the two reactions. 

The cDNA ends were blunted by treatment with T4 DNA polymerase. 
DNA from the two samples were combined after resuspension in a 

10 total volume of 24 /xl of water. Four microliters of 10X T4 
buffer (330 mM Tris-acetate pH 7.9, 670 mM KAc, 100 mM MgAc, 
and 1 mg/ml gelatin) , 4 pi of 1 mM dNTP, 4 pi 50 mM DTT, and 4 
pi of 1 V/pl T4 DNA polymerase (Boehringer-Mannheim) were added 
to the DNA. The samples were incubated at 15 - C for 1 hour. 

15 After incubation, 160 pi of 10 mM Tris-HCl pH 7.4, 1 mM EDTA 
was added, and the sample was phenol-chloroform extracted. The 
DNA was isopropanol precipitated and pelleted by 
centrifugation. After centrifugation the DNA was washed with 
80% ethanol and air dried. 

20 

After resuspension of the DNA in 6.5 ^1 water, Eco RI adapters 
were added to the blunted DNA. One microliter of 1 pq/pl Eco RI 
adapter (Invitrogen, San Diego, CA Cat. # N4 09-2 0) , 1 pi of 10X 
ligase buffer (0.5 M Tris pH 7.8 and 50 mM MgCl 2 ) , 0.5 /xl of 10 
25 mM ATP, 0.5 /tl of 100 mM DTT, and 1 fil of 1 V/pl T4 DNA ligase 
(Boehringer-Mannheim) were added to the DNA. After the sample 
was incubated overnight at room temperature, the ligase was 
heat denatured at 65 *C for 15 minutes. 

3 0 The Sst I cloning site encoded by the first strand primer was 
exposed by digestion with Sst I endonuclease. Thirty-three 
microliters of water, 5 pi of 10X Sst I buffer (0.5 M Tris pH 
8.0, 0.1 M MgCl 2 , and 0.5 M NaCl) , and 2 pi of 5 V/pl Sst I 
were added to the DNA, and the samples were incubated at 37 *C 

3 5 for 2 hours. One hundred and fifty microliters of 10 mM Tris- 
HCl pH 7.4, 1 mM EDTA was added, the sample was phenol- 
chloroform extracted, and the DNA was isopropanol precipitated. 
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The cDNA was chromatographed on a Sepharose CL 2B ( Pharmacia 
LKB Biotechnology) column to size-select the cDNA and to remove 
free adapters, A 1.1 ml column of Sepharose CL 2B was poured 
into a 1 ml plastic disposable pipet and the column was washed 
5 with 50 column volumes of buffer (10 mM Tris pH 7.4 and 1 mM 
EDTA) . The sample was applied, one-drop fractions were 
collected, and the DNA in the void volume was pooled. The 
fractionated DNA was isopropanol precipitated. After 
centrifugation the DNA was washed with 80% ethanol and air 
10 dried. 

A Fusariu m oxvsporum cDNA library was established by ligating 
the cDNA to the vector pYcDE8 1 (cf. WO 90/10698) which had been 
digested with Eco RI and Sst I. Three hundred and ninety 

15 nanograms of vector was ligated to 400 ng of cDNA in a 80 /xl 
ligation reaction containing 8 /xl of 10 X ligase buffer, 4 /tl 
of 10 mM ATP, 4 fil 200 mM DTT, and 1 unit of T4 DNA ligase 
(Boehringer-Mannheim. After overnight incubation at room 
temperature, 5 /z.g of oyster glycogen and 120 ftl of 10 mM Tris- 

20 HC1 and 1 mM EDTA were added and the sample was phenol- 
chloroform extracted. The DNA was ethanol precipitated, 
centrifuged, and the DNA pellet washed with 80% ethanol. After 
air drying, the DNA was resuspended in 3 ftl of water. Thirty 
seven microliters of electroporation competent DH10B cells 

25 ( Gibco— BRli) was added to the DNA, and electroporation was 
completed with a Bio-Rad Gene Pulser (Model #1652076) and Bio- 
Rad Pulse Controller (Model #1652098) electroporation unit 
(Bio-Rad Laboratories, Richmond, CA) . Four milliliters of SOC 
(Hanahan, J. Mol. Biol. 166 (1983) , 557-580) was added to the 

30 electroporated cells, and 400 pi of the cell suspension was 
spread on each of ten 150 mm LB amipicillin plates. After an 
overnight incubation, 10 ml of LB amp media was added to each 
Plate, and the cells were scraped into the media. Clycerol 
stocks and plasmid preparations were made from each plate. The 

35 library background (vector without insert) was established at 
aproximately 1% by ligating the vector without insert and 
titering the number of clones after electroporation. 
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Screening the cDNA library 

Full length cellulase cDNA clones were isolated from the 
Fusarium oxysporum cDNA library by hybridization to PCR 
5 generated genomic oligonucleotide probes- 

The PCR-generated oligonucleotides: ZC3309, a 40-mer coding for 
part of the C family cellobiohydrolase, ATT ACC AAC ACC AGC GTT 
GAC ATC ACT GTC AGA GGG CTT C; ZC3310, a 28-mer coding for the 

10 C family endoglucanase, AAC TCC GTT GAT GAA AGG AGT GAC GTA G; 
and ZC3 311, a 40-mer coding for the F family cellulase, CGG AGA 
GCA GCA GGA ACA CCA GAG GCA GGG TTC CAG CCA C, were end labeled 
with T 4 polynucleotide kinase and 32 " p gamma ATP. For the 
kinase reaction 17 picomoles of each oligonucleotide were 

15 brought up to 12-5 /xl volume with deionized water. To these 
were added 2 /il 10 X kinase buffer (1 X: 10 mM magnesium 
chloride, 0.1 mM EDTA, 50 mM Tris pH 7.8), 0.5 pi 200 mM 
dithiothreitol, 1 fjul 32 P gamma ATP 150 mCi/ml, Amersham) , 2 pi 
T 4 polynucleotide kinase (10 V/fil BRL) . The samples were then 

20 mixed and incubated at 37*C for 30 minutes. Oligonucleotides 
were separated from unincorporated nucleotides by precipitation 
with 180 pi TE (10 mM tris pH 8. , 1 mM EDTA), 100 /il 7.5 H 
ammonium acetate, 2 /x,l mussel glycogen (2 0 mg/ml, Gibco-BRL) 
and 750 £tl 100% ethanol. Pellets were dissolved in 200 ftl 

25 distilled water. To determine the amount of radioactivity 
incorporated in the oligonucleotides, 10 ftl of 1:1000 dilutions 
of oligonucleotides were read without scintillation fluid in a 
Beckman LS 1800 Liquid Scintillation System. Activities were: 
115 million cpm for ZC3309, 86 million cpm for ZC3310, and 79 

30 million cpm for ZC3311. 

Initially, a library of 20,000 cDNA clones was probed with a 
mixture of each of the three oligonucleotides corresponding to 
the C family cellobiohydrolase, C family endoglucanase and F 
35 family cellulase clones. The cDNA library was plated out from 
titered glycerol stocks stored at -70 'C. Four thousand clones 
were plated out on each of five 150 mm LB ampicillin (1000 
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Mg/ml) plates. Lifts were taken in duplicate following standard 
methodology Sambrook et al., Molecular Cloning . 1989) using 
Biotrans 0.2 /im 137 mm filters. The filters were baked at 80 *C 
in vacuum for 2 hours , then swirled overnight in a 
5 crystallizing dish (Pharmacia IiKB Biotechnology, Alameda, CA) 
at 37 # C in 80 ml prehybridization solution (5 X Denhardt's (IX: 
0.02% Ficoll, 0.02% polyvinylpyrrolidone, 0.02% bovine serum 
albumen Pentax Fraction 5 (Sigma, St. Louis, MO) ) 5 X SSC (1 X: 
0.15 M sodium chloride, 0.15 M sodium citrate pH 7.3)), 100 
10 /x,g/ml denatured sonicated salmon sperm DNA, 50 mM sodium 
phosphate pH 6.8, 1 mM sodium pyrophosphate, 100 /Of ATP, 20% 
formamide, 1% sodium dodecyl sulfate) (Ulrich et al. EMBO J. 3. 
(1984), 361-364). 

15 Prehybridized filters were probed by adding them one at a time 
into a crystallizing dish with 80 ml prehybridization solution 
with 80 million cpm ZC3309, 86 million cmp ZC3 310 and 79 
million cpm ZC3311 and then swirled overnight at 37'C. Filters 
were then washed to high stringency. The probed filters were 

20 washed with three 400 ml volumes of low stringency wash 
solution (2 X SSC, 0.1% SDS) at room temperature in the 
crystallizing dish, then with four 1-liter volumes in a plastic 
box . A further wash for 2 0 minutes at 68 • C with 
tetramethylammonium chloride wash solution (TMACL: 3 M 

25 tetramethylammonium chloride, 50 mM Tris-HCl pH 8.0, 2 mM EDTA, 
1 g/1 SDS) (Wood et al., Proc. Natl. Acad. Sci. 82 (1985)) 
provided a high stringency wash for the 28-mer ZC3310 
independent of its base composition 1585-1588) . The filters 
were then blotted dry, mounted on Whatman 3MM paper and covered 

3 0 with plastic wrap for autoradiography. They were exposed 
overnight at -70 'C with intensifying screens and Kodak XAR-5 
film. 

Two putative positives appeared on duplicate filters. The 
35 corresponding areas on the plates with colonies were picked 
into 1 ml of IX polymerase chain reaction (PCR) buffer (100 mM 
Tris HC1 pH 8.3, 500 mM KC1, 15 mM MgCl , 0.1% gelatin; Perkin 
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Elmer Cetus) and plated out at five tenfold dilutions on 100 mm 
LB plates with 7 0 ^g/ml ampicillin. These plates were grown at 
37 'C overnight. Two dilutions of each putative clone were 
chosen for rescreening as outlined above. One isolated clone, 
- 5 pZFH196 was found. This was grown up overnight in 10 ml 2X YT 
broth (per liter: 16 g bacto-tryptone , 10 g bacto-yeast 
extract, 10 g NaCl) . Twenty three micrograms of DNA were 
purfified by the rapid boiling method (Holmes and Quigley, 
Anal. Biochem. 114 (1981), 193-197) . From restriction analysis 
10 the clone was found to be approximately 2,000 base pairs in 
length. Sequence analysis showed it to contain a fragment 
homologous to the C family cellobiohydrolase fragment cloned by 
PCR. 

15 In an attempt to isolate additional cellulase cDNA clones, a 
cDNA library of 2 million clones was plated out on 20 150 mm LB 
plates (100 ptg/ml ampicillin) containing approximately 100,000 
cDNA clones. Lifts were taken in duplicate as in the first 
screening attempt. This library was screened with 

20 oligonucleotides corresponding to the three cellulase species 
as described above except that the hybridization was carried 
out with formamide in the prehybridization buffer and at a 
temperature of 30 *C. Washing with TMACL was carried out twice 
for 20 minutes at 67 *C. Between 8 and 2 0 signals were found on 

25 duplicate filters of each of the 20 plates- Fifteen plugs were 
taken from the first plate with the large end of a pasteur 
pipet into 1 ml 1 X PCR buffer (Perkin-Elmer Cetus) . PCR was 
carried out on the bacterial plugs with three separate 
oligonucleotide mixtures. Each mixture contained the vector 

30 specific oligonucleotide ZC2847 and additionally, a different 
cellulase specific oligonucleotide (ZC3309, ZC3310 or ZC3311) 
within each mixture. Amplitag polymerase (Perkin-Elmer Cetus) 
was used with Pharmacia Ultrapure dNTP and following Perkin 
Elmer Cetus procedures. Sixteen picomoles of each primer were 

35 used in 40 pi reaction volumes. Twenty microliters of cells in 
1 X PCR buffer were added to 20 /il mastermix which contained 
everything needed for PCR except for DNA. After an initial 1 
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minute 45 second denaturation at 94 "C 28 cycles of: 45 seconds 
at 94 *C, 1 minute at 45 - C and 2 minutes at 72 °C with a final 
extension of 10 minutes at 72 *C were employed in a Perkin Elmer 
thermocycler. Ten of the 15 plugs yielded a band when primed 
5 with the C family specific oligonucleotide ZC3309 and ZC2847. 
The other mixtures gave no specific products. Five plugs which 
produced the largest bands by PCR, therefore possibly being 
full length C family cellobiohydrolases, along with the 5 plugs 
which did not produce PCR bands, were plated out at five 10 

10 fold dilutions onto 100 mm LB plates with 70 /tg/ml ampicillin 
and grown overnight. Duplicate lifts were taken of two ten fold 
dilutions each. Prehydridization and hybridization were carried 
out as described above with a mixture of the 3 
oligonucleotides. Isolated clones were found on all 10 of the 

15 platings. These were picked from the dilution plates with a 
toothpick for single colony isolation on 100 mm LB plates with 
70 ^g/ml ampicillin. PCR was carried out on isolated bacterial 
colonies with 2 oligonucleotides specific for the C family 
cellobiohydrolase (ZC3409 (CCG TTC TGG ACG TAC AG A) and ZC3411 

20 (TGA TGT CAA GTT CAT CAA) ) . Conditions were identical to those 
described above except for using 10 picomoles of each primer in 
25 fil reaction volumes. Colonies were added by toothpick into 
PCR tubes with 25 pel mastermix before cycling. Five of the 10 
gave strong bands of the size expected for a C family 

25 cellobiohydrolase. Isolated colonies were then grown up in 20 
ml of Terrific Broth (Sambrook et al. , op. cit. . A2) and DNA 
was isolated by the rapid boiling method . The clones were 
partially sequenced by Sanger dideoxy sequencing. From sequence 
analysis the 5 clones which did not give bands specific for a 

30 C family cellobiohydrolase by PCR were shown to be F family 
cellulase clones. 

In order to clone the C family endoglucanase , the cDNA library 
of 2 million clones was rescreened with only ZC3310. Conditions 
35 of prehydridization and hybridization were like those used 
above. Filters were hybridized for 10 hours at 30 'C with one 
million CPM endlabeled ZC3310 per ml prehybridization solution 
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without formaraide. Washing with TMACL was carried out 2 times 
for 2 0 minutes at 6 0 ° C . S even weak s ignal s were found on 
duplicate filters- Plugs were picked with the large end of a 
pipet into 1 ml LB broth. These were each plated out in 5 10 
5 fold dilutions on 100 mm LB plates with 70 /tg/ml ampicillin. 
Duplicate lifts were taken of 2 dilutions each and were 
processed as described above. Prehybridization, hybridization, 
and washing were carried out as for the first level of 
screening. Three isolated clones were identified and streaked 

10 out for single colony hybridization. Isolates were grown 
overnight in 50 ml of Terrific Broth (per liter: 12 g tryptone, 
24 g yeast extract, 4 ml glycerol, autoclaved, and 100 ml of 
0.17 M KH 2 P0 4 , 0.72 H K 2 HP0 4 (Sambrook et al- , op^ cit^_, A2) 
and DNA was isolated by alkaline lysis and PEG precipitation by 

15 standard methods (Maniatis 1989, 1.38-1.41). From restriction 
analysis, one clone (pZFH223) was longer than the others and 
was chosen for complete sequencing. Sequence analysis showed it 
to contain the PCR fragment cloned initially. 



20 DNA sequence analysis 

The cDNAs were sequenced in the yeast expression vector 
pYCDES 1 . The dideoxy chain termination method (F. Sanger et 
al., Proc. Natl. Acad, Sci. USA 74, 1977, pp. 5463-5467) using 

25 @35-S dATP from New England Nuclear (cf. M-D. Biggin et al., 
Proc. Nat3- Acad. Sci. USA 80, 1983, pp. 3963-3965) was used 
for all sequencing reactions. The reactions were catalysed by 
modified t7 DNA polymerase from Pharmacia (cf. S. Tabor and 
C.C. Richardson, Proc. Natl. Acad. Sci. USA 84, 1987, pp. 4767- 

30 4771) and were primed with an oligonucleotide complementary to 
the ADH1 promoter (2C996: ATT GTT CTC GTT CCC TTT CTT) , 
complementary to the CYC1 terminator (ZC3635: TGT ACG CAT GTA 
ACA TTA) or with oligonucleotides complementary to the DNA of 
interest. Double stranded templates were denatured with NaOH 

35 (E.Y. Chen and P.H. Seeburg, DNA 4, 1985, pp. 165-170) prior to 
hybridizing with a sequencing oligonucleotide. Oligonucleotides 
were synthesized on an Applied Biosystems Model 380A DNA 



WO 91/17244 



PCT/DK91/00124 



30 

synthesizer. The oligonucleotides used for the sequencing 
reactions are listed in the sequencing oligonucleotide table 
below: 



5 C-family cellobiohvdrolase sequencing primers 

ZC3411 TGA TGT CAA GTT CAT CAA 

ZC3408 TCT GTA CGT CCA GAA CGG 

ZC34 07 ATG ACT TCT CTA AGA AGG 

ZC3406 TCC AAC ATC AAG TTC GGT 

10 ZC3410 AGG CCA ACT CCA TCT GAA 

ZC3309 ATT ACC AAC ACC AGC GTT GAC ATC ACT GTC AGA GGG CTC 
C 

ZC3409 CCG TTC TGG ACG TAG AGA 



15 F-familv cellulase sp ecific sequencing primers 
ZC3413 CCA TCG ACG GTA TTG GAT 

ZC3 311 CGG AGA GCA GCA GGA ACA CCA GAG GCA GGG TTC CAG CCA 

C 

ZC3412 GAG GGT AGA GCG ATC GTT 

20 

C-familv endo glucanase specific sequencing primers 
ZC3739 TGA TCT CAT CGA GCT GCA CC 
ZC3684 GTG ATG CTC AGT GCT ACG TC 

ZC3310 AAC TCC GTT GAT GAA AGG AGT GAC GTA G 
25 ZC3750 TCC AAT AGC TTC CCA GCA AG 
ZC3 683 TGT CCC TTG ATG TTG CCA AC 



The DNA sequences of the full-length cDNA clones, as well as 
the derived amino acid sequences, are shown in the appended 
30 Figs. 11 (C-family cellobiohydrolase) , 12 (F-family cellulase) 
and 13 (C-family endoglucanase) . 
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Example 4 

Isolation of end oalucanase EGI gene from H. insolens 

The cDNA library described in example 1 was also screened with 
5 a 35 bp oligonucleotide probe in the antisense configuration 
with the sequence: 

NOR-770: 5' GCTTCGCCCATGCCTTGGGTGGCGCCGAGTTCCAT 3' 

The sequence was derived from the amino acid sequence of an 
10 alcalase fragment of EGI purified from H. insolens . using our 
knowledge of codon bias in this organism. Complete clones of 
1.6 kb contained the entire coding sequence of 1,3 kb as shown 
in Fig. 14A-E. The probe sequence NOR-770 is located at Met 344 - 
Ala 355- 

15 

Construction of expression olasmids of EGI rfull length) and 
EGI 1 (truncated) 

The EGI gene still containing the poly-A tail was inserted into 
20 an A. orvzae expression plasmid as outlined in Fig 2. The 
coding region of EGI was cut out from the Ncol-site in the 
initiating Met-codon to the Bam Hl-site downstream of the poly- 
A region as a 14 50 bp fragment from pHW4 80. This was ligated to 
a 3.6 kb Ncol-Narl fragment from pSX224 (Fig. 1) containing the 
25 TAKA promoter and most of pUC19, and to a 960 bp Narl-BamHI 
fragment containing the remaining part with the AMG-terminator . 
The 960 bp fragment was taken from p960 which is equivalent to 
p777 (described in EP 238,02 3) except for the inserted gene. 
The resulting expression plasmid is termed pHW485. 

30 

The expression plasmid pHW704 with the full length EGI gene 
without poly A tail is shown in Fig. 3. From the BstEII site 
1300 bp downstream of the Ncol-site was inserted a 102 bp 
BstEII— BamHI linker (2645/2646) ligated to Bglll-site in the 
35 vector. The linker contains the coding region downstream of 
BstEII-site with 2 stop codons at the end and a Pvul-site near 
the C-terminal to be used for addition of CBD and B-regions. 
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Expression plasmid pHW697 with the truncated EGI 1 gene was 
constructed similarly using a BstEII-BamHI linker (2492/2493) 
of 69 bp. In this linker we introduced a Pstl-site altering 
Val 421 to Leu 42 i and the last 13 amino acids of the coding 
5 region: K 4 2 3 PKPKPGHGPRSD 435 were eliminated. The short tail 
with the rather unusual sequence was cut off to give EGI 1 a C- 
terminal corresponding to the one found in T. reesei EGI just 
upstream of the A and B-region. 

10 

Construction of an expression plasmid of EGI 1 with CBD and B 
region from a - 43 kD endoalucanase added C-terminally 

The - 43 kD endoglucanase of H. isolens described in DK patent 
15 application No. 73 6/91 has shown good washing performance. 
Besides the catalytic domain, 43 kD cellulase has a C-terminal 
CBD and B region which has been transferred to EGI 1 which does 
not have any CBD or B region itself. The construction was done 
in 2 steps, as outlined in Fig. 4. The Pstl-HincII linker 
20 (028/030 M) intended to connect the C-terminal of EGI' to the 
B-region of 43 kD cellulase, was subcloned in pUC19 Pstl-EcoRI 
with C-terminal Hinc2-EcoRI 100 bp fragment from 43 kD 
cellulase gene in pSX320 (Fig 5; as described in DK 736/91). 
From the subclone pHW767 the CBD and B-region was cut out as a 
25 250 bp Pstl-Bglll fragment and ligated to pHW485 (Fig. 2) 
BstEII-Bglll fragment of 5.7 kb and to the remaining BstEII- 
Pstl fragment of 55 bp from pHW697 (Fig. 3). The resulting 
expression plasmid pHW7 68 has the - 43 kD endoglucanase CBD and 
B region added to Gln 422 of EGI'. 

30 

Construction of an expression plasmid of EGI with the CBD and 
B region from - 43 kD endoalucanase added C-terminallv 

3 5 This plasmid was constructed in a similar way as pHW7 68 except 
that, in this case, the C-terminal linker yielded the complete 
sequence of EGI. Fig. 6 shows the procedure in 3 steps. The 



REPLACEMENTSHECT 



WO 91/17244 



PCT/DK91/00124 



33 

PvuI-HincII linker (040 M/041 M) was subcloned in pUC18 to give 
pHW775, into which a HincII-EcoRI 1000 bp fragment from pSX 320 
(Fig, 5) was inserted to give pHW776. From this the CBD and B 
region was cut out as a 250 bp Pvul-Bglll fragment and ligated 
5 to 5.7 kb BstEII-Bglll fragment from pHW485 (Fig. 2} and 90 bp 
BstEII-Pvul fragment from pHW704 (Fig. 3) . The resulting 
expression plasmid pHW777 contains the - 43 kD endoglucanase 
CBD and B region added to Asp 435 in the complete EGI sequence. 

10 

Expression in A. oryzae of EGI and EGI ' with and without the 
CBD and B region from - 43 kD endoglucanase 

The expression plasmids pHW485, pHW704, pHW697, pHW768 and 
15 pHW777 were transformed into A. oryzae IFO 4177 as described in 
example 2 . Supernatants from transf ormants grown in YPD medium 
as described were analyzed by SDS-PAGE, where the native EGI 
has an apparent Mw of 53 kD. EGI 1 looks slightly smaller as 
expected, and the species with the added CBD and B region are 
20 increased in molecular weight corresponding to the size of the 
CBD and B region with some carbohydrate added, A polyclonal 
antibody AS 169 raised against the - 4 3 kD endoglucanase 
recognizes EGI and EGI • only when the - 43 kD CBD and B region 
are added, while all 4 species are recognized by a polyclonal 
25 antibody AS78 raised against a cellulase preparation from H. 
insolens . All 4 species have endoglucanase activity as measured 
on soluble cellulose in the form of carboxy methyl cellulose. 

Linkers 

30 

2492/2493 : BstE2-Pstl-BanHl 

5 1 GTCACCTACACCAACCTCCGCTGGGGCGAG 
3 • GATGTGGTTGGAGGCGACCCCGCTC 

35 

ATCGGCTCGACCTACCAGGAGCTGCAGTAGTAA 
TAGCCGAGCTGGATGGTCCTCGACGTCATCATT 

TGATAG 3 1 69 bp 

4 0 ACTATCCTAG 5' 68 bp 
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2645/2646: BstE2-Xmal-PvuI-BamHl 

5 ' GTCACCTACACCAACCTCCGCTGGGGCGAGATCGGC 
3 ' GATGTGGTTGGAGGCGACCCCGCTCTAGCCG 

5 

TCGACCTACCAGGAGGTTCAGAAGCCTAAGCCCAAG 
AGCTGGATGGTCCTCCAAGTCTTCGGATTCGGGTTC 

CCCGGGCACGGCCCCCGATCGGACTAATAG 3 1 102 bp 

10 GGG C CCGTGCCGGGGGCTAGCCTG ATT AT CCT AG 5 f 101 bp 

028 M/030 M : Pstl-HincII 

15 5 1 GTCCAGCAGCACCAGCTCTCCGGTC 3» 2 5 bp 

3« ACGTCAGGTCGTCGTGGTCGAGAGGCCAG 5' 29 bp 

040 M/041 M : PvuI-HincII 

20 

5 1 CGTCCAGCAGCACCAGCTCTCCGGTC 3' 2 6 bp 

3 1 TAGCAGGTCGTCGTGGTCGAGAGGCCAG 5" 28 bp 



25 Example 5 

~ 43 kD endoglucanase with different CBDs and B-regions: 

In order to test the influence on the - 43 kD endoglucanase of 
3 0 the different CBDs and B regions from the A region clones we 
have substituted the original CBD and B region from - 43 kD 
w ith the other C-terminal CBDs and B regions, i.e. A-l, A-8, A- 
9 f A-ll, and A-19 (cf. Example 1). In order to test the 
concept we have also made a construction where the 43 kD B 
35 region has been deleted. 



Fragments : 

40 All fragments were made by PCR amplification using a Perkin- 
Elmer/Cetus DNA Amplification System following the 
manufacturers instructions . 
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1) A PCR fragment was made which covers the DNA from 56 
bp upstream of the Bam HI site on pSX 320 (Fig- 5) to 717 bp 
within the coding region of the -4 3 kD endoglucanase gene and 
at the same time introduces a Kpn I site at pos. 708 and a Sma 

5 I site at pos. 702 in the coding region which is at the very 
beginning of the B region. This PCR fragment was made with the 
primers NOR 1542 and NOR 3010 (see list of oligonucleotides 
below) . 

10 

2) A PCR fragment was made which includes the CBD and B 
region of A-l introducing a Kpn I site at the very beginning of 
the B region in frame with the Kpn I site introduced in 1) and 
introducing a Xho I site downstream of the coding region of the 

15 gene. Primers used: NOR 3012 upstream and NOR 3 011 downstream. 

3) As 2) except that the fragment covered the CBD and B 
region of A-8 and the Xho I site in the expression vector 
downstream of gene. Primers: NOR 3 017 and NOR 2516. 

20 

4) As 2) but with primers NOR 3016 and NOR 3015 
covering the CBD and B region from A-9. 

5) As 3) but with primers NOR 3021 and NOR 2516 covering 
25 the CBD and B region from A-ll. 

6) As 2) but with primers NOR 3032 and NOR 3 022 covering 
the CBD and B region from A-19. 

30 7) A PCR fragment which includes the CBD from - 43 kD 

endoglucanase and the Xho I site downstream from the gene on 
pSX 320 introducing a Pvu II site at the very end of the B 
region. 

Primers: NOR3 023 and NOR2516. 

35 

Combinations : 
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1) +2) inserted as Bam HI - Kpn I and Kpn I - Xho I into pToC 
68 (described in DK736/91) Bam HI - Xho I, thus coding for the 
43 kD core enzyme with the CBD and B region from A-l. 

5 l) + 3) : Like above giving a 43 kD enzyme with the A-8 CBD/B 
region. 

1) +4): As above, but with the A-9 CBD and B region. 
10 1) + 5) : As above, but with the A-ll CBD and B region. 



1) + 6) : As above, but with the A-19 CBD and B region. 

15 1) +7) inserted as Bam HI - Sma I and Pvu II - Xho I into pToC 
68 Bam HI - Xho I, thus coding for the 43 kD enzyme without the 
B region. 

Oligonucleotides : 

20 

NOR 1542 : 5 ' - CGACAACATCACATCAAGCTCTCC - 3 ■ 
NOR 2516: 5 1 - CCATCCTTTAACTATAGCGA - 3' 
25 NOR 3 010: 5 1 - GCTGGTGC TGGTACCCGGGA TCTGGACGGCAGGG - 3' 

Kpn Sma 

NOR 3011: 5 1 - GCATCGGTACCGGCGGCGGCTCCACTGGCG - 3' 

Kpn 

30 

NOR 3012: 5 1 - CTCACTCCATCTCGAGTCTTTCAATTTACA - 3 1 

Xho 

NOR 3 015: 5' - CTTTTCTCGAGT CCCTTAGTT C AAGCACTGC - 3 1 
35 Xho 

NOR 3016: 5 1 - TGACCGGTA£CGGCGGCGGCAACACCAACC - 3' 

Kpn 

40 NOR 3 017: 5 1 - TCACCGGTACCGGCGGTGGAAGCAACAATG - 3" 

Kpn 

NOR 3021: 5 1 - TCTTCGGTACCAGCGGCAACAGCGGCGGCG - 3" 

Kpn 

45 
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NOR 3022: 5» - CGCTG GGTACC AACAACAATCCTCAGCAGG -3' 

Kpn 

NOR 3 023: 5 ? - CTCCCAG CAGCTG CACTGCTGAGAGGTGGG - 3 1 
5 PVU II 

NOR 3 032: 5' - CGG CCTCGAGA CCTTACAGGCACTGCGAGT - 3« 

Xho 

10 

Example 6 

Fusion of a bacterial catalytic domain to a fungal CBD 

15 The endoglucanase Endo 1 produced by Bacillus lautus NCIMB 
40250 (described in PCT/DK9 1/00013) consists of a catalytic 
domain (core) (Ala(32) - Val(555)) and a C terminal cellulose 
binding domain (CBD) (Gln556 - Pro700) homologous to the CBD of 
a Bi_ subtilis endoglucanase (R.M. MacKay et al. 1986, Nucleic 

20 Acids Res. 2JLt 9159-70) . The CBD is proteolytically cleaved off 
when the enzyme is expressed in fij_ subtilis or EL-_ coli 
generating a CMC degrading core enzyme. In this example this 
core protein was fused with the B region and CBD of the - 43 kD 
endoglucanase from Humicola insolens (described in DK 73 6/91) . 

25 

Construction of the fusion. 

The plasmid pCaHj 170 containing the cDNA gene encoding the - 
43 kD endoglucanase was constructed as shown in Fig. 7. pCaHj 

30 170 was digested with Xho II and Sal I. The 223 bp Xho II - Sal 
I fragment was isolated and ligated into pUC 19 (Yanisch-Perron 
et al. 1985. Gene 33., 103-119) digested with BamH I and Sal I. 
The BamH I site was regenerated by this Xho II-BamH I ligation. 
The resulting plasmid, IM 2, was digested with Eco Rl and BamH 

35 I and ligated with the linker NOR 3045 - NOR 3046: 

NOR 3045 5' AATTCCGCGGAACGATATCTCCGA 3 r 

NOR 3 04 6 3' GGCGCCTTGCTATAGAGGCTCTAG 5 f 

EcoR I EcoR V Mbo I 

40 Sac II 
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The resulting plasmid, IM 3, was digested with EcoR V and SacII 
and ligated to the 445 bp Hinc II - Sac II pPL 517 fragment. 
pPL 517 contains the entire Bacillus Endo 1 gene 
(PCT/DK9 1/00013) . The product of this ligation was termed IM 4. 
5 In order to replace the Bacillus signal peptide of Endo 1 with 
the fungal signal peptide from the 4 3 kdal endoglucanase four 
PCR primers were designed for "Splicing by Overlap Extension" 
(SOE) fusion (R M Horton et al. (1989) :Gene, 77, 61-68) . The 43 
kD signal sequence was amplified from the plasmid pCaHj 109 (DK 

10 73 6/91) introducing a Bel I site in the 5 'end and a 21 bp 
homology to the Bacillus endo 1 gene in the 3 « end using the 5 " 
primer NOR 3270 and the 3" primer NOR 3275. The part of the 
Endo I gene 5* to the unique Sac II site was amplified using 
the 5' primer NOR 3 276 introducing a 21 bp homology to the 43 

15 kdal gene and the 3' primer NOR 3271 covering the Sac II site. 
The two PCR framents were mixed, melted, annealed and filled up 
with the taq polymerase (Fig. 9) . The resulting hybrid was 
amplified using the primers NOR 3270 and NOR 3271. The hybrid 
fragment was digested with Bel 1 and SacII and ligated to the 

20 676 bp Sac II - Sal I fragment from IM 4 and the Aspergillus 
expression vector pToc 68 (DK 73 6/91) digested with BamH I. The 
product of this ligation, pCaHj 180 (Fig. 10) , contained an 
open reading frame encoding the 43 kD signal peptide and the 
first four N terminal aminoacids of the mature - 43 kD 

25 endoglucanase (Met (1) -Arg(25) fused to the core of Endo 1 
(Ser(34)-Val(549) ) followed by the peptide Ile-Ser-Glu (encoded 
by the linker) fused to the 43 kD B region and CBD (Ile(233)- 
Leu(285) . pCaHj 180 was used to trans form Aspergillus orvzae 
IFO 4177 using selection on acetamide by cotransf ormation with 

30 pToC 90 (cf. DK 736/91) as described in published EP patent 
application No. 238 023. 



NOR 3270 5 1 TTGAATTCTGATCAAGATGCGTTCCTCCC 3» 

NOR 3275 5 1 AATGGTGAAAGTGACATCACTCCTGCCATCAGCGGCAAGGGC 3' 

3 5 NOR 3276 5' GCCCTTGCCGCTGATGGCAGGAGTGATGTCACTTTCACCATT 3 1 

NOR 3271 5' AGCGCGTCCGCGGTAGCTATG 3' 
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The sequence of the Endo 1 core and the - 43 kD CBD and B 
region is shown in the appended Fig. 15A-D, 
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CLAIMS 

1- A cellulose- or hemi cellulose-degrading enzyme which is 
derivable from a fungus other than Trichoderma or 
5 Phanerophaete f ^ which comprises a carbohydrate binding 
domain homologous to a terminal A region of Trichoderma reesei 
cellulases, which carbohydrate binding domain comprises the 
following amino acid sequence 

10 1 10 

Xaa Xaa Gin Cys Gly Gly Xaa Xaa Xaa Xaa Gly Xaa Xaa Xaa Cys Xaa 



15 



20 30 
Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Asn Xaa Xaa Tyr Xaa Gin Cys Xaa 

Xaa 



or a subsequence thereof capable of effecting binding of the 
20 enzyme to an insoluble cellulosic or hemicellulosic substrate, 

2. An enzyme according to claim 1, which is derivable from a 
strain of Humicola r Fusarium or Mvcel iopthora . 

25 3 . An enzyme according to claim 1, wherein the variations in 
the amino acid sequence shown in claim 1 are selected as 
follows 





in 


position 


l, 




in 


position 


2, 


30 


in 


position 


7, 




in 


position 


8, 




in 


position 


9, 




in 


position 


10, 




in 


position 


12, 


35 


in 


position 


13, 




in 


position 


14, 




in 


position 


18, 




in 


position 


19, 




Leu or Ala; 
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20, 
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in position 24, the amino acid is Gin or lie; 
in position 26, the amino acid is Gin, Asp or Ala; 
in position 27, the amino acid is Trp, Phe or Tyr; 
in position 29, the amino acid is Ser, His or Ala; and/or 
5 in position 32, the amino acid is Leu, lie, Gin, Val or Thr. 

4 „ An enzyme according to claim 3 , wherein the carbohydrate 
binding domain comprises the following amino acid sequence 

10 Trp Gly Gin Cys Gly Gly Gin Gly Trp Asn Gly Pro Thr Cys Cys Glu 
Ala Gly Thr Thr Cys Arg Gin Gin Asn Gin Trp Tyr Ser Gin Cys 
Leu; 

Trp Gly Gin Cys Gly Gly lie Gly Trp Asn Gly Pro Thr Thr Cys Val 
15 Ser Gly Ala Thr Cys Thr Lys lie Asn Asp Trp Tyr His Gin Cys 
Leu; 

Trp Gly Gin Cys Gly Gly lie Gly Phe Asn Gly Pro Thr Cys Cys Gin 
Ser Gly Ser Thr Cys Val Lys Gin Asn Asp Trp Tyr Ser Gin Cys 
20 Leu; 

Trp Gly Gin Cys Gly Gly Asn Gly Tyr Ser Gly Pro Thr Thr Cys Ala 
Glu Gly - Thr Cys Lys Lys Gin Asn Asp Trp Tyr Ser Gin Cys Thr 
Pro; 

25 

Trp Gly Gin Cys Gly Gly Gin Gly Trp Gin Gly Pro Thr Cys Cys Ser 
Gin Gly - Thr Cys Arg Ala Gin Asn Gin Trp Tyr Ser Gin Cys Leu 
Asn; 

30 Trp Gly Gin Cys Gly Gly Gin Gly Tyr Ser Gly Cys Thr Asn Cys Glu 
Ala Gly Ser Thr Cys Arg Gin Gin Asn Ala Tyr Tyr Ser Gin Cys 
lie; 

Trp Gly Gin Cys Gly Gly Gin Gly Tyr Ser Gly Cys Arg Asn Cys Glu 
Ser Gly Ser Thr Cys Arg Ala Gin Asn Asp Trp Tyr Ser Gin Cys 
35 Leu; 
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Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys Thr Thr Cys Val 
Ala Gly Ser Thr Cys Thr Lys lie Asn Asp Trp Tyr His Gin Cys 
Leu; 

5 Trp Gly Gin Cys Gly Gly Gin Asn Tyr Ser Gly Pro Thr Thr Cys Lys 
Ser Pro Phe Thr Cys Lys Lys lie Asn Asp Phe Tyr Ser Gin Cys 
Gin; or 

Trp Gly Gin Cys Gly Gly Asn Gly Trp Thr Gly Ala Thr Thr Cys Ala 
10 Ser Gly Leu Lys Cys Glu Lys lie Asn Asp Trp Tyr Tyr Gin Cys Val 

5. An enzyme according to any of claims 1-4, which further 
comprises an amino acid sequence which defines a linking B 
region connecting the carbohydrate binding domain to the 

15 catalytically active domain of the enzyme. 

6. An enzyme according to claim 5, wherein the linking B region 
is one which is enriched in the amino acids glycine and/or 
asparagine and/or proline and/ or serine and/ or threonine and/or 

20 glutamine. 

7. An enzyme according to claim 6, wherein one or more of said 
amino acids appear in short, repetitive units. 

25 8. An enzyme according to any of claims 1-7, which comprises a 
carbohydrate binding domain derived from one naturally 
occurring cellulose- or hemicellulose-degrading enzyme, an 
amino acid sequence defining a linking B region, which amino 
acid sequence is derived from another naturally occurring 

3 0 cellulose- or hemicellulose-degrading enzyme, as well as a 
catalytically active domain derived from the enzyme supplying 
either the carbohydrate binding domain or B region or from a 
third enzyme. 

35 9. An enzyme according to claim 8, wherein the catalytically 
active domain is derived from an enzyme which does not, in 
nature, comprise a carbohydrate binding domain or B region. 
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10. An enzyme according to any of claims 1-9 which is a 
cellulase, e.g. an endoglucanase, cellobiohydrolase or /3- 
glucosidase . 

5 11. A DNA construct which comprises a DNA sequence encoding an 
enzyme according to any of claims 1-10. 

12. An expression vector which carries an inserted DNA 
construct according to claim 11. 

10 

13. A cell which is transformed with a DNA construct according 
to claim 11 or with an expression vector according to claim 12. 

14. A cell according to claim 13 which is a fungal cell, e.g. 
15 belonging to a strain of Aspergillus , e.g. Aspergillus niger or 

Aspergillus oryzae . or a yeast cell, e.g. belonging to a strain 
of Saccharomyces . such as Saccharomvces cerevisiae . 

15. A method of producing an enzyme according to any of claims 
20 1-10, wherein a cell according to claim 13 or 14 is cultured 

under conditions conducive to the production of the enzyme, and 
the enzyme is subsequently recovered from the culture. 

16. An agent for degrading cellulose or hemicellulose, the 
25 agent comprising an enzyme according to any of claims 1-10. 

17. An agent according to claim 16 comprising a combination of 
two or more enzymes according to any of claims 1-10, or a 
combination of one or more enzymes according to any of claims 

3 0 1-10 with one or more other enzymes with cellulose- or 
hemicellulose-degrading activity . 

18. A carbohydrate binding domain homologous to a terminal A 
region of Trichoderma reesei cellulases, which carbohydrate 

35 binding domain comprises the following amino acid sequence 
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1 10 

Xaa Xaa Gin Cys Gly Gly Xaa Xaa Xaa Xaa Gly Xaa Xaa Xaa Cys Xaa 

20 30 
5 Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Asn Xaa Xaa Tyr Xaa Gin Cys Xaa 

Xaa 



10 or a subsequence thereof capable of effecting binding of a 
protein to an insoluble cellulosic or hemicellulosic substrate. 

19. A carbohydrate binding domain according to claim 18 , 
wherein the variations in the amino acid sequence shown in 
15 claim 18 are selected as follows 



in position 1, the amino acid is Trp or Tyr; 

in position 2, the amino acid is Gly or Ala; 

in position 7, the amino acid is Gin, lie or Asn; 

2 0 in position 8, the amino acid is Gly or Asn; 

in position 9, the amino acid is Trp, Phe or Tyr; 



in 


position 


10, 


the 


amino 


acid 


is 


Ser, 


Asn, Thr or 


Gin; 


in 


position 


12, 


the 


amino 


acid 


is 


Pro, 


Ala or Cys; 




in 


position 


13, 


the 


amino 


acid 


is 


Thr, 


Arg or Lys; 




in 


position 


14, 


the 


amino 


acid 


is 


Thr, 


Cys or Asn; 




in 


position 


18, 


the 


amino 


acid 


is 


Gly 


or Pro; 




in 


position 


19, 


the 


amino 


acid 


(if present) is Ser 


, Thr, Ph 


Leu or Ala; 


















in 


position 


20, 


the 


amino 


acid 


is 


Thr 


or Lys: 




in 


position 


24, 


the 


amino 


acid 


is 


Gin 


or lie; 




in 


position 


26, 


the 


amino 


acid 


is 


Gin, 


Asp or Ala; 




in 


position 


27, 


the 


amino 


acid 


is 


Trp, 


Phe or Tyr; 




in 


position 


29, 


the 


amino 


acid 


is 


Ser, 


His or Tyr; 


and/ or 


in 


position 


32, 


the 


amino 


acid 


is 


Leu, 


lie, Gin, Val or Thr 



35 

20. A carbohydrate binding domain according to claim 19, 
wherein the carbohydrate binding domain comprises the following 
amino acid sequence 
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Trp Gly Gin Cys Gly Gly Gin Gly Trp Asn Gly Pro Thr Cys Cys Glu 
Ala Gly Thr Thr Cys Arg Gin Gin Asn Gin Trp Tyr Ser Gin Cys 
Leu; 

5 Trp Gly Gin Cys Gly Gly lie Gly Trp Asn Gly Pro Thr Thr Cys Val 
Ser Gly Ala Thr Cys Thr Lys lie Asn Asp Trp Tyr His Gin Cys 
Leu; 

Trp Gly Gin Cys Gly Gly lie Gly Phe Asn Gly Pro Thr Cys Cys Gin 
10 Ser Gly Ser Thr Cys Val Lys Gin Asn Asp Trp Tyr Ser Gin Cys 
Leu; 

Trp Gly Gin Cys Gly Gly Asn Gly Tyr Ser Gly Pro Thr Thr Cys Ala 
Glu Gly - Thr Cys Lys Lys Gin Asn Asp Trp Tyr Ser Gin Cys Thr 
15 Pro; 

Trp Gly Gin Cys Gly Gly Gin Gly Trp Gin Gly Pro Thr Cys Cys Ser 
Gin Gly - Thr Cys Arg Ala Gin Asn Gin Trp Tyr Ser Gin Cys Leu 
Asn; 

20 

Trp Gly Gin Cys Gly Gly Gin Gly Tyr Ser Gly Cys Thr Asn Cys Glu 
Ala Gly Ser Thr Cys Arg Gin Gin Asn Ala Tyr Tyr Ser Gin Cys 
He; 

Trp Gly Gin Cys Gly Gly Gin Gly Tyr Ser Gly Cys Arg Asn Cys Glu 
25 Ser Gly Ser Thr Cys Arg Ala Gin Asn Asp Trp Tyr Ser Gin Cys 
Leu; 

Trp Ala Gin Cys Gly Gly Asn Gly Trp Ser Gly Cys Thr Thr Cys Val 
Ala Gly Ser Thr Cys Thr Lys He Asn Asp Trp Tyr His Gin Cys 
Leu; 

30 

Trp Gly Gin Cys Gly Gly Gin Asn Tyr Ser Gly Pro Thr Thr Cys Lys 
Ser Pro Phe Thr Cys Lys Lys He Asn Asp Phe Tyr Ser Gin Cys 
Gin ; or 

35 Trp Gly Gin Cys Gly Gly Asn Gly Trp Thr Gly Ala Thr Thr Cys Ala 
Ser Gly Leu Lys Cys Glu Lys He Asn Asp Trp Tyr Tyr Gin Cys Val 
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21. A linking B region derived from a cellulose- or 
hemicellulose-degrading enzyme, said region comprising an amino 
acid sequence enriched in the amino acids glycine and/or 
asparagine and/or proline and/or serine and/or threonine and/or 

5 glutamine . 

22. A B region according to claim 21, wherein one or more of 
said amino acids appear in short, repetitive units. 

10 23. A B region according to claim 21 or 22, which comprises the 
following amino acid sequence 





Ala 


Arg 


Thr Asn Val Gly Gly Gly Ser Thr 


Gly 


Gly 


Gly 


Asn 


Asn 


Gly 




Gly 


Gly 


Asn Asn Gly Gly Asn Pro Gly Gly 


Asn 


Pro 


Gly 


Gly 


Asn 


Pro 


15 


Gly 


Gly 


Asn Pro Gly Gly Asn Pro Gly Gly 


Asn 


Pro 


Gly 


Gly 


Asn 


Cys 




Ser 


Pro 


Leu; 
















Pro 


Gly 


Gly Asn Asn Asn Asn Pro Pro Pro 


Ala 


Thr 


Thr 


Ser 


Gin 


Trp 




Thr 


Pro 


Pro Pro Ala Gin Thr Ser Ser Asn 


Pro 


Pro 


Pro 


Thr 


Gly 


Gly 


20 


Gly 


Gly 


Gly Asn Thr Leu His Glu Lys; 
















Gly 


Gly 


Ser Asn Asn Gly Gly Gly Asn Asn 


Asn 


Gly 


Gly 


Gly 


Asn 


Asn 




Asn 


Gly 


Gly Gly Gly Asn Asn Asn Gly Gly 


Gly 


Asn 


Asn 


Asn 


Gly 


Gly 


25 


Gly 


Asn 


Thr Gly Gly Gly Ser Ala Pro Leu; 














Val 


Phe 


Thr Cys Ser Gly Asn Ser Gly Gly Gly 


Ser 


Asn 


Pro 


Ser 


Asn 




Pro 


Asn 


Pro Pro Thr Pro Thr Thr Phe lie 


Thr 


Gin 


Val 


Pro 


Asn 


Pro 




Thr 


Pro 


Val Ser Pro Pro Thr Cys Thr Val Ala Lys; 








30 


Pro 


Ala 


Leu Trp Pro Asn Asn Asn Pro Gin 


Gin Gly Asn 


Pro 


Asn 


Gin 




Gly 


Gly 


Asn Asn Gly Gly Gly Asn Gin Gly 


Gly Gly Asn 


Gly 


Gly 


Cys 




Thr 


Val 


Pro Lys ; 
















Pro 


Gly 


Ser Gin Val Thr Thr Ser Thr Thr 


Ser 


Ser 


Ser 


Ser 


Thr 


Thr 


35 


Ser 


Arg 


Ala Thr Ser Thr Thr Ser Ala Gly 


Gly Val 


Thr 


Ser 


lie 


Thr 




Thr 


Ser 


Pro Thr Arg Thr Val Thr lie Pro 


Gly Gly Ala 


Ser 


Thr 


Thr 




Ala 


Ser 


Tyr Asn; 
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Glu Ser Gly Gly Gly Asn Thr Asn Pro Thr Asn Pro Thr Asn Pro Thr 
Asn Pro Thr Asn Pro Thr Asn Pro Trp Asn Pro Gly Asn Pro Thr Asn 
Pro Gly Asn Pro Gly Gly Gly Asn Gly Gly Asn Gly Gly Asn Cys Ser 
Pro Leu; or 

Pro Ala Val Gin lie Pro Ser Ser Ser Thr Ser Ser Pro Val Asn Gin 
Pro Thr Ser Thr Ser Thr Thr Ser Thr Ser Thr Thr Ser Ser Pro Pro 
Val Gin Pro Thr Thr Pro Ser Gly Cys Thr Ala Glu Arg 
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BamHI 



Bal I 




EcoRI 



til 4- Bst BI 



Hind IH 



BamHI ^-t—TSTOP 
Aval 

I 

t 

AvaI-~Klenow+4dNTP'S 



Bst BI/ BamHI 

I 

Bst BI- Avail bl.e.860bp > 

+ y 

BamHI -Bst BI 700bp ■> 




BamHI/NruI 
4.5kb 



EcoRI 



Avan/ NruI 




Sal I 



o m . Bst BI 

Bal I 
BamHI 

Bst Bl/BamH l/Bal I 



f Bst BI- BamHI 6.0kb 
! + 

Bst BI - Bal I 620bp 

I Bal I - BamH I linker: 

5" GATCCACCATGG 3' NOR-969 
3' GTGGTACC 5* NOR-970 



BamHI Ball 
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BamHI-' 



Rg- 2 Ncol 
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Narl 




BstEI-Bgin 
5.7kb 




Nco 



EG I', EG I, 

truncated complete 

Fig. 3 
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EcoRI 



Sail 




Bgin 



cn 
o 

O 
o 
o 



HincII 



EcoRI 



BamHI/ 
" Bgin 

pstl 
BstEE 



BamHI^NcoI Fig. 4 
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Aatn 5455 
XmnI5132 i 



Sea 



183 Ndel 
235 Bbel 
235 Narl 

391 BstXI 

548 Smal 




Afl UT 3644 

HinDH 3285 
Sph I 3252 
Sa! I 3240 
StuI3220 



BssH E 2810 



Xmal 
PpuMI 

963 BspMII 

1148 Bgl H 
1196 Asp718 
1196 Kpnl 



1800 Nael 



BamHI 



Fig. 5 
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6/22 



pvul 



pvu I 
,Hinc H 
EcoRI 



HincII-Eco0109 
2.2kb 



pvuI-Eco0109 
300bp 




+ 040/041 M linker 
pvuI-Hinc H 



EcoO109 

pvul 

pvu I 
HincII 

pHW775 tEcoRi 




EcoRI 



EcoO1091l/, T 
Nrul 



HincII-EcoRI 
2.6kb 



BamHI 




Hincl- 
Narl EcoRI' 
1000bp 

EcoRI 




pvul 
HincII 

BglH 



EcoRI 



Bgll 
BamHI 



BamHI 



BstEI 



v BstEI-Bgin 
5.7kb 



pvuI-BgH 
250bp 



Sal 




EcoRI 



BamHI 



BamHI 
Bgin 
pvul 

Bst ° Fig. 6 




EcoRI 

BstEE-pvuI 



BamHI 
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BamHI 




Fig. 7 
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BamHI „ 8/22 
Xhol . . XhoH 

XhoH 



XhoE 
Xhol 




BamH I 
• Scl I 



1. Digestion with 
BamH I.EcoRI 

2. Insertion of linker 
NOR3045-NOR3046 



HincI 




EcoRI 
i SacII 
EcoRY 



Hindi 
Hind 
HincH 
Hincll 



EcoRY-SaclI 



Fig. 8 




dol 



EcoRY/HincH 
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NOR3276 



NOR3270 



Bcll/BamHI 




43 kdal 



NOR3271 \ U-SacH 

PPL517 '' WEndo! 



Sal/Xho 




Bell 



SacII 




5'. 
3' 



MIX, MELT, ANNEAL 
3* 



3' 



+ 



5' 



5' 



CH1091 



Bell 



72"C. NUCLEOTIDES, 
TAQ POLYMERASE 



■5' 



v. ■ -. v 



SacI 



Bell 



SacII 



CH1094 



43 kdal signal Fig. 9 Endol N terminal 
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Bell SacII 
43 kdal signal Endol N terminal 



Sal I.SacII 




SacI EcoRY/HincI 
Fig. 10 
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agaccggaattcgcggccgccatctatccaacggtctagcttcacttcacaatgtatcgc 

m y r 

atcgtcgcaaccgcctcggctcttattgccgctgctcgggctcaacaggtctgctctttg 

IVATASALIAAARAQQVCSL 

aacaccgagaccaagcctgccttgacctggtccaagtgtacatccagcggctgcagcgat 

NTETKPALTWSKCTSSGCSD 

gtcaagggctccgttgttattgatgccaactggcgatggactcaccagacttctgggtct 

VKGSVVIDANWRWTHQTSGS 

accaactgttacaccggaaacaagtgggacacctccatctgcactgatggcaagacctgc 

TNCYTGKKWDTSI CTDGKTC 

GCCGAAAAGTGCTGTCTTGATGGCGCCGACTATTCTGGTACCTACGGAATCACCTCCAGC 

AEKCCLDGADYSGTYGITSS 

ggcaaccagctcagtcttggattcgtcaccaacggtccctacagcaagaacatcggcagc 

GNQLSLGFVTNGPYSKNIGS 

cgaacctacctcatggagaacgagaacaccatccagatgttccagcttctgggcaacgag 

RTYLMENENTYQHFQLLGNE 

ttcacctttgatgtcgatgtctctggtatcggctgcggtctgaacggtgcccctcacttc 

FTFDVDVSGIGCGLNGAPHF 

gtcagcatggacgaggatggtggcaaggccaagtactccggaaacaaggccggagccaag 

VSMDEDGGKAKYSGNKAGAK 

t a egg a a c t ggcT ACtGTG ATGccC AgTGCCCTCGTG ATGTC AAGTTCATCAACGG AGTT 

YGTGYCDAQCPRDVKFINGV 

GCCAACTCTGAGGGCTGGAAGCCCTCTGACAGTGATGTCAACGCtggtgttggtaatctg 

AN SEGWKPSDS DVNAGVGNL 

ggcacctgctgccccgagatggatatctgggaggccaactccatctccaccgccttcact 

GTCCPEMDIWEANSISTAFT 

cctcatccttgcaccaagctcacacagcactcttgcactiggcgactcttgtggtggaacc 

PHPCTKLTQHSCTGDSCGGT 

tactctagtgaccgatatggcggtacttgcgatgccgacggttgtgatttcaatgcctac 

YSSDRYGGTCDADGCDFNAY 

cgtcagggcaacaagaccttctacggtcctggatccaacttcaacatcgacaccaccaag 

RQGNKTFYGPGSNFNIDTTK 

aagatgactgttgtcactcagttccacaagggcAGCAAcGGACGTCTTTCTGAGATCACC 

KMTVVTQFHKGSNGRliSEIT 

CGTCTGTACGTCCAGAACGGCAAGGTCATTGCCAACTCAGAGTCCAAGATTGCAGGCAAC 

RLYVQNGKVIANSESKIAGN 

CCCGGTAGCTCTCTCACCTCTGACTTCTGCTCCAAGCAGAAGAGCGTCTTTGGCGATATC 

PGSSLTSDFCSKQKSVFGDI 

GATGACTTCTCTAAGAAGGGTGGCTGGAACGgCATGAGCGATGCTCTCTCTGCCCCTATG 

DD FSKKGGWNGMSDALSAPM 

GTTCTTGTTATGTCTCTCTGGCACGACCACCACTCCAAcATGCtcTGGCTgGACtctacc 

VLVMSLWHDHHSNMLWLDST 

tacccaaccgactctaccaaggttggatctcaacgaggttcttgcgctaccacctctggc 

YPTDSTKVGSQRGSCATTSG 

aagccctccgaccttgagcgagatgttcccaactccaaggtttccttctccaacatcaAG 

KPSDLERDVPNSKVSFSNIK 

TTCGGTCCCATCGGAAGCACCTACAAGAGCGACGGCACCACCCCCAACCCCCCTgCCAGC 

FGPIGSTYKSDGTTPNPPAS 

AGCAGCACCACTGGTTCTTCCACTCCCACCAACCCCCCTGCCGGTAGCGTCGACCAATGG 

SSTTGSSTPTNPPAGSVDQW 

GGACAgTGcGGTGGCCAgaactacagcggccccacgacctgcaagtctcctttcacctgc 

GQCGGQNYSGPTTCKSPFTC 

aagaagatcaacgacttctactcccagtgtcagtaaaggggctgccgagctatctagcat 

KK INDFYSQCQ 

gagattgagaaacgatgtgatgagtggacgatcaaggagaagtgtgtggatgatatgaac 
ttgatgtgggaggac p- 11 
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gaattcgcggccgcctgcttcgaagcatcagctcattgagatcagtcaaaatgcatacc 

M H T 

ctttcggttctcctcgctctcgctcccgtgtccgcccttgctcaggctcccatctgggga 

LSVLLALAPVSALAQAPIWG 

cagtgcggtggcaatggttggaccggtgctacaacctgcgctagtggtctgaagtgtgag 

QCGGNGWTGATTCASGLKCE 

aagatcaacgactggtactatcagtgtgttcctggatctggaggatctgaaccccagcct 

KINDWYYQCVPGSGGSEPQP 

tcgtcaactcagggtggtggcactcctcagcctactggcggtaacagcggcggcactggt 

SSTQGGGTPQP .TGGNSGGTG 

ctcgacgccaaattcaaggccaagggcaagcagtactttggtaccgagattgaccactac 

LDAKFKAKGKQYFGTEIDHY 

caccttaacaacaatcctctgatcaacattgtcaaggcccagtttggccaagtgacatgc 

HLNNNPLINIVKAQFGQVTC 

gagaacagcatgaagtgggatgccattgagccttcacgcaactccttcaccttcagtaac 

ENSMKWDAIEPSRNSFTFSN 

gctgacaaggtcgtcgacttcgccactcagaacggcaagctcatccgtgGCCACACTCTT 

ADKVVDFATQNGKLIRGHTL 

CTCTGGCACTCTCAGCTGCCTCAGTGGGTTCAGAACATCAACGATCGCTCTACCCTCACC 

LWHSQLPQWVQKINDRSTLT 

GCGGTCATCGAGAACCACGTCAAGACCATGGTCACCCGCTACAAGGGCAAGATCCTCCAG 

AVIENHVKTMVTRYKGKILQ 

TGGGACGTTGTCAACAACGAGATCTTCGCTGAGGACGGTAACCTCCGCGACAGTGTCTTC 

WDVVNKEIFAEDGNLRDSVF 

AGCCGAGTTCTCGGTGAGGACTTTGTCGGTATTGCTTTCCGCGCTGCCCGCGCCGCTGAT 

SRVLGE DFVG IAFRAARAAD 

CCCGCTGCCAAGCTCTACATCAACGATTATAACCTCGACAAGTCCGACTATGCTAAGGTC 

PAAKLY INDYNLDKSDYAKV 

ACCCGCGGAATGGTCGCTCACGTTAATAAGTGGATTGCTGCTGGTATTCCCATCGACGGT 

TRGMVAHVNKWIAAGIPIDG 

ATTGGATCTCAGGGCCATCTTGCTGCTCCTAGTGGCTGGAACCCTGCCTCTGGTGTTCCT 

I GSQGHLAAPSGWNPASGVP 

GCTGCTCTCCGAGCTCTTGCCGCCTCGGACGCCAAGGAGATTGCTATcactgagcttgat 

AALRALAASDAKEIAITELD 

attgccggtgccagtgctaacgattaccttactgtcatgaacgcttgccttgccgttccc 

IAGASANDYLTVMNACLAVP 

aagtgtgtcggcatcactgtctggggtgtctctgacaaggactcgtggcgacctggtgac 

KCVGITVWGVSDKDSWRPGD 

aaccccctcctctacgacagcaactaccagcccaaggctgctttcaatgccttggctaac 

NPLLYDSNYQPKAAFNALAN 

gctctgtgagctgttgttgatgtatgtcgctggatcatacaacgaaacgtcctagttgga 

A L • 

taaagcgttgatggtagaatgat 



Fig. 12 
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gaattcgcggccgcctagataagtcactacctgatctctgaataatctttcatcatgaag 

M K 

tctctctcactcatcctctcagccctggctgtccaggtcgctgttgctcaaacccccgac 

SLSLILSALAVQVAVAQTPD 

aaggccaaggagcagcaccccaagctcgagacctaccgctgcaccaaggcctctggctgc 

KAKEQHPKLETYRCTKASGC 

aagaagcaaaccaactacatcgtcgccgaCgcaggtattcacggcattCgcagaagcgCC 

KKQTNYIVADAG IHGIRRSA 

GGCTGCGGTGACTGGGGTCAAAAGCCCAACGCCACAGCCTGCCCCGATGAGGCATCCTGC 

GCGDWGQKPNATACPDEASC 

GCTAAGAACTGTATCCTCAGTGGTATGGACTCAAACGCTTACAAGAACGCTGGTATCACT 

AKNCILSGMDSNAYKNAGIT 

ACTTCTGGCAACAAGCTTCGTCTTCAGCAGCTTATCAACAACCAGCTTGTTTCTCCTCGG 

TSGNKLRLQQLI NNQLVSPR 

GTTTATCTGCTTGAGGAGAACAAGAAGAAGTATGAGATGCTTCAGCTCACTGGTACTGAA 

VYLLEENKKKYEMLHLTGTE 

TTCTCTTTCGACGTTGAGATGGAGAAGCTTCCTTGTGGTATGAATGGTGCTTTGTACCTT 

FSFDVEMEKLPCGMNGALYL 

TCCGAGATGCCACAGGATGGTGGTAAGAGCACGAGCCGAAACAGCAAGGCTGGTGCCTAC 

S EMPQDGGKSTS R NSKAGAY 

TATGGTGCTGGATACTGTGATGCTCAGTGCTACGTCactcctttcATCAACGGAGTTGGC 

YGAGYCDAQCYVTPFINGVG 

AACATCAAGGGACAGGGTGTCTGCTGTAACGAGCTCGACATCTGGGAGGCCAACTCCCGC 

NIKGQGVCCNELDIWEANSR 

GCAACTCACATTGCTCCTCACCCTTGCAGCAAGCCCGGCCTCTACGGCTGCACAGGCGAT 

ATHIAPHPCSKPGLYGCTGD 

GAGTGCGGCAGCTCCGGTTTCTGCGACAAGGCCGGCTGCGGCTGGAACCACAACCGCATC 

ECGSSGICDKAGCGWNHNRI 

AACGTGACCGACTTCTACGGCcgcggCAAGCAGTACAAGGTCGACAGCACCCGCAAGTTC 

NVTDFYGRGKQY KVDSTRKF 

ACCGTGACATCTCAGTTCGTCGCCAACAAGCAGGGTGATCTCATCGAGCTGCACCGCCAC 

TVTSQFVANKQG DLIELHRH 

TACATCCAGGACAACAAGGTCAtcgagtctgctgtcgtcaacatctccggccctcccaag 

YIQDNKVIESAVVNISGPPK 

atcaacttcatcaatgacaagtactgcgctgccaccggcgccaacgagtacatgcgcctc 

INFINDKYCAATGANEYHRL 

ggcggtactaagcaaatgggcgatgccatgtcccgcggaatggttctcgccatgagcgtc 

GGTKQMGDAMSRGMVLAMSV 

tggtggagcgagggtgatttcatggcctggttggatcagggtgttgctggaccctgtgac 

WWSEGDFMAWLDQGVAGPCD 

gccaccgagggcgatcccaagaacatcgtcaaggtgcagcccaaccctgaagtgacattt 

ATEGDPKNIVKVQPNPEVTF 

agcaacatcagaattggagagattggatctacttcatcggtcaaggctcctgcgtatcct 

SNIRIGEIGS TSSVKAPAYP 

ggtcctcaccgcttgtaaaaacatcaaacaacaccgtgtccaatatggATCTTAGTGTCC 

G P H R L • 

ACTTGCTGGGAAGCTATTGGAGCACATATGCAAAACAGATGTCCACTAGCTTGACACGTA 
TGTCGGGGCAAAAAAATCTCTTTCTAGGATAGGAGAACATATTGGGTGTTTGGACTTGTA 
TATAAATGATACATTTTTCATATTATATTATTTTCAACATATTTTATTTC 

AAAAAAAAAAAAAAAAAAAAAAAA 
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TTTCTTCGTCGAGCTCGAGTCGTCCGCCGTCTCCTCCTCCTCCTCCTTCCAGTCTTTGAG 
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TTCCTTCGACCTGCAGCGTCCTGAACAACTCGCTCTAGCTCAACAACCATGGCTCGCGGT 

MetAlaArgGly 



130 140 150 160 170 180 

! I ! J ! ! 

ACCGCTCTCCTCGGCCTGACCGCGCTCCTCCTGGGGCTGGTCAACGGCCAGAAGCCTGGT 
ThrAlaLeuLeuGlyLeuThrAlaLeuLeuLeuGlyLeuValAsnGlyGlnLysProGly 



190 200 210 220 230 



240 
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GAGACCAAGGAGGTTCACCCCCAGCTCACGACCTTCCGCTGCACGAAGAGGGGTGGTTGC 
GluThrLysGluValHisProGlnLeuThrThrPheArgCysThrLysArgGlyGlyCys 

250 260 270 280 290 300 
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AAGCCGGCGACCAACTTCATCGTGCTTGACTCGCTGTCGCACCCCATCCACCGCGCTGAG 
LysProAlaThrAsnPhelleValLeuAspSerLeuSerHisProIleHisArgAlaGlu 



Fig. 14A 
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GGCCTGGGCCCTGGCGGCTGCGGCGACTGGGGCAACCCGCCGCCCAAGGACGTCTGCCCG 
GlyLeuGlyProGlyGlyCysGlyAspTrpGlyAsnProProProLysAspValCysPro 

370 380 390 400 410 420 

l i l i l t 

III III 

GACGTCGAGTCGTGCGCCAAGAACTGCATCATGGAGGGCATCCCCGACTACAGCCAGTAC 
AspValGluSerCysAlaLysAsnCysIleMetGluGlylleProAspTyrSerGlnTyr 

430 440 450 460 470 480 

ill ill 
111 lit 

GGCGTCACCACCAACGGCACCAGCCTCCGCCTGCAGCACATCCTCCCCGACGGCCGCGTC 

GlyValThrThrAsnGlyThrSerLe\iArgLeuGlnHisIleLeuProAspGlyArgVal 

490 500 510 520 530 540 
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CCGTCGCCGCGTGTCTACCTGCTCGACAAGACGAAGCGCCGCTATGAGATGCTCCACCTG 
ProSerProArgValTyrLeuLeuAspLysThrLysArgArgTyrGlxiNetLeuHisLeu 

550 560 570 580 590 600 
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ACCGGCTTCGAGTTCACCTTCGACGTCGACGCCACCAAGCTGCCCTGCGGCATGAACAGC 
ThrGlyPheGluPheThrPheAspValAspAlaThrLysLeuProCysGlyMetAsnSer 

610 620 630 640 650 660 
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GCTCTGTACCTGTCCGAGATGCACCCGACCGGTGCCAAGAGCAAGTACAACTCCGGCGGT 
AlaLeuTyrLeuSerGluMetHisProThrGlyAlaLysSerLysTyrAsnSerGlyGly 

Fig. 14B 
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ill ii i 

GCCTACTACGGTACTGGCTACTGCGATGCTCAGTGCTTCGTGACGCCCTTCATCAACGGC 
AlaTyrTyrGlyThrGlyTyrCysAspAlaGlnCysPheValThrProPhelleAsnGly 
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TTGGGCAACATCGAGGGCAAGGGCTCGTGCTGCAACGAGATGGATATCTGGGAGGTCAAC 

LeuGlyAsnlleGluGlyLysGlySerCysCysAsnGluMetAspIleTrpGluValAsn 

790 800 810 820 830 840 
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TCGCGCGCCTCGCACGTGGTTCCCCACACCTGCAACAAGAAGGGCCTGTACCTTTGCGAG 

SerArgAlaSerHisValValProHisThrCysAsnLysLysGlyLeuTyrLeuCysGlu 

850 860 870 880 890 900 
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GGTGAGGAGTGCGCCTTCGAGGGTGTTTGCGACAAGAACGGCTGCGGCTGGAACAACTAC 
GlyGluGluCysAlaPheGluGlyValCysAspLysAsnGlyCysGlyTrpAsnAsnTyr 

910 920 930 940 950 960 
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CGCGTCAACGTGACTGACTACTACGGCCGGGGCGAGGAGTTCAAGGTCAACACCCTCAAG 

ArgValAsnValThrAspTyrTyrGlyArgGlyGluGluPheLysValAsnThrLeuLys 

970 980 990 1000 1010 1020 
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CCCTTCACCGTCGTCACTCAGTTCTTGGCCAACCGCAGGGGCAAGCTCGAGAAGATCCAC 
ProPheThrValValThrGlnPheLeuAlaAsnArgArgGlyLysLeuGluLysIleHis 



Fig. 14C 
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CGCTTCTACGTGCAGGACGGCAAGGTCATCGAGTCCTTCTACACCAACAAGGAGGGAGTC 

ArgPheTyrValGlnAspGlyLysVallleGluSerPheTyrThrAsnLysGluGlyVal 

1090 1100 1110 1120 1130 1140 

iii iii 
ill iii 

CCTTACACCAACATGATCGATGACGAGTTCTGCGAGGCCACCGGCTCCCGCAAGTACATG 

ProTyrThrAsnMetlleAspAspGluPheCysGluAlaThrGlySerArgLysTyrMet 

1150 1160 1170 1180 1190 1200 
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GAGCTCGGCGCCACCCAGGGCATGGGCGAGGCCCTCACCCGCGGCATGGTCCTGGCCATG 
GluLeuGlyAlaThrGlnGlyMetGlyGluAlaLeuThrArgGlyMetValLeuAlaMet 

1210 1220 1230 1240 1250 1260 
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AGCATCTGGTGGGACCAGGGCGGCAACATGGAGTGGCTCGACCACGGCGAGGCCGGCCCC 
SerlleTrpTrpAspGlnGlyGlyAsnMetGluTrpLeuAspHisGlyGluAlaGlyPro 

1270 1280 1290 1300 1310 1320 
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TGCGCCAAGGGCGAGGGCGCCCCGTCCAACATTGTCCAGGTTGAGCCCTTCCCCGAGGTC 
CysAlaLysGlyGluGlyAlaProSerAsnlleValGlnValGluProPheProGluVal 

1330 1340 1350 1360 1370 1380 
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Ill ill 

ACCTACACCAACCTCCGCTGGGGCGAGATCGGCTCGACCTACCAGGAGGTTCAGAAGCCT 
ThrTyrThrAsnLeuArgTrpGlyGluIleGlySerThrTyrGlnGluValGlnLysPro 
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AAGCCCAAGCCCGGCCACGGCCCCCGGAGCGACTAAGTGGTGATGGGATAGAGGGATAGA 
LysProLysProGlyHisGlyProArgSerAspEKO 
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ATAGTGGATAGCACATAGATCGGCGGTTTTGGATAGTTTAATACATTCCGTTGCCGTTGT 
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ATGCGTTCCTCCCCCCTCCTCCCGTCCGCCGTTGTGGCCGCCCTGCCGGTGTTGGCCCTT 

METArgSerSerProLeuLeuProSerAlaValValAlaAlaLeuProValLeuAlaLeu 
43 Xdal signalpeptide and H terminal 

70 80 90 100 110 120 
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GCCGCTGATGGCAGGAGTGATGTCACTTTCACGATTAATACGCAGTCGGAACGTGCAGCG 

AlaAlaAspGlyArgSerAspValThrPheThrlleAsnThrGlnSerGluArgAlaAla 
N terminal 

130 140 150 160 170 180 
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ATCAGCCCCAATATTTACGGAACCAATCAGGATCTGAGCGGGACGGAGAACTGGTCATCC 
IleSerProAsnlleTyrGlyThrAsnGlnAspLeuSerGlyThrGluAsnTrpSerSer 

190 200 210 220 230 240 
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CGCAGGCTCGGAGGCAACCGGCTGACGGGTTACAACTGGGAGAACAACGCATCCAGCGCC 
ArgArgLeuGlyGlyAsnArgLeuThrGlyTyrAsnTrpGluAsnAsnAlaSerSerAla 

250 260 270 280 290 300 

r ■■■it 

GGAAGGGACTGGCTTCATTACAGCGATGATTTTCTCTGCGGCAACGGTGGTGTTCCAGAC 

GlyArgAspTrpLeuHisTyrSerAspAspPheLeuCysGlyAsnGlyGlyValProAsp 

Endo 1 core 

310 320 330 340 350 360 
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A C CG A CTG CG A C AAG C CG G G GGCGGTTGTT A CCGCTTTTC A CG AT AAATCTTTGG AG AAT 
ThrAspCysAspLysProGlyAlaValValThrAlaPheHisAspLysSerLeuGluAsn 

370 380 3 90 4 00 410 4 20 
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GG AG CTT ACT CCATTGTAACGCTGCAAATGGCGGGTTATGTGTCCCGGGATAAGAACGGT 
GlyAlaTyrSerlleValThrLeuGlnMETAlaGlyTyrValSerArgAspLysAsnGly 

430 440 450 460 470 480 
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CCAGTTGACGAGAGTGAGACGGCTCCGTCACCGCGTTGGGATAAGGTCGAGTTTGCCAAA 
ProValAspGluSerGluThrAlaProSerProArgTrpAspLysValGluPheAlaLys 

Fig. 15A 
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lit iii 
AATGCGCCGTTCTCCCTTCAGCCTGATCTGAACGACGGACAAGTGTATATGGATGAAGAA 

AsnAlaProPheSerLeuGlnProAspLeuAsnAspGlyGlnValTyrMETAspGluGlu 

550 560 570 580 590 600 

ill iji 
ill lit 
GTTAACTTCCTGGTCAACCGGTATGGAAACGCTTCAACGTCAACGGGCATCAAAGCGTAT 

ValAsnPheLeuValAsnArgTyrGlyAsnAlaSerThrSerThrGlylleLysAlaTyr 
610 620 630 640 650 660 
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TCGCTGGATAACGAGCCGGCGCTGTGGTCTGAGACGCATCCAAGGATTCATCCGGAGCAG 
SerLeuAspAsnGluProAlaLeuTrpSerGluThrHisProArglleHisProGluGln 

670 680 690 700 710 720 

ill I I | 

lit ill 
TTACAAGCGGCAGAACTCGTCGCTAAGAGCATCGACTTGTCAAAGGCGGTGAAGAACGTC 

LeuGlnAlaAlaGluLeuValAlaLysSerlleAspLeuSerLysAlaValLysAsnVal 

730 740 750 760 770 780 
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GATCCGCATGCCGAAATATTCGGTCCTGCCCTTTACGGTTTCGGCGCATATTTGTCTCTG 
AspProHisAlaGluIlePheGlyProAlaLeuTyrGlyPheGlyAlaTyrLeuSerLeu 

790 800 810 820 830 840 

ill ill 
ill iii 
CAGGACGCACCGGATTGGCCGAGTTTGCAAGGCAACTACAGCTGGTTTATCGATTACTAT 

GlnAspAlaProAspTrpProSerLeuGlnGlyAsnTyrSerTrpPhelleAspTyrTyr 

850 860 870 880 890 900 
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CTGGATCAGATGAAGAATGCTCATACGCAGAACGGCAAAAGATTGCTCGATGTGCTGGAC 

I^uAspGlnMETLysAsnAlaHisThrGlnAsnGlyLysArgLeuLeuAspValLeuAsp 

910 920 930 940 950 960 

'iii!! 
GTCCACTGGTATCCGGAAGCACAGGGCGGAGGCCAGCGAATCGTCTTTGGCGGGGCGGGC 

ValHisTrpTyrProGluAlaGlnGlyGlyGlyGlnArglleValPheGlyGlyAlaGly 

Fig. 15B 
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970 980 990 1000 1010 1020 

AATATCGATACGCAGAAGGCTCGCGTACAAGCGCCAAGATCGCTATGGGATCCGGCTTAC 
AsnlleAspThrGlnLysAlaArgValGlnAlaProArgSerLeuTrpAspProAlaTyr 

1030 1040 1050 1060 1070 1080 
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CAGGAAGACAGCTGGATCGGCACATGGTTTTCAAGCTACTTGCCCTTAATTCCGAAGCTG 

GlnGluAspSerTrpIleGlyThrTrpPheSerSerTyrLeuProLeulleProLysLeu 
1090 1100 1110 1120 1130 1140 

CAATCTTCGATTCAGACGTATTATCCGGGTACGAAGCTGGCGATCACAGAGTTCAGCTAC 
GlnSerSerlleGlnThrTyrTyrProGlyThrLysLeuAlalleThrGluPheSerTyr 

1150 1160 1170 1180 1190 1200 

! i J ! ■ i 

GGCGGAGACAATCACATTTCGGGAGGCATAGCTACCGCGGACGCGCTCGGCATTTTTGGA 

GlyGlyAspAsnHisIleSerGlyGlylleAlaThrAlaAspAlaLeuGlyllePheGly 
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AAATATGGCGTTTATGCCGCGAATTACTGGCAGACGGAGGACAATACCGATTATACCAGC 
LysTyrGlyValTyrAlaAlaAsnTyrTrpGlnThrGluAspAsnThrAspTyrThrSer 

1270 1280 1290 1300 1310 1320 
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GCTGCTTACAAGCTGTATCGCAACTACGACGGCAATAAATCGGGGTTCGGCTCGATCAAA 
AlaAlaTyrLysLeuTyrArgAsnTyrAspGlyAsnLysSerGlyPheGlySerlleLys 

1330 1340 1350 1360 1370 1380 

GTGGACGCCGCTACGTCCGATACGGAGAACAGCTCGGTATACGCTTCGGTAACTGACGAG 
ValAspAlaAlaThrSerAspThrGluAsnSerSerValTyrAlaSerValThrAspGlu 

1390 1400 1410 1420 1430 1440 
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GAGAATTCCGAACTCCACCTGATCGTGCTGAATAAAAATTTCGACGATCCGATCAACGCT 
GluAsnSerGluLeuHisLeuIleValLeuAsnLysAsnPheAspAspProIleAsnAla 

Fig. 15C 
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ACTTTCCAGCTGTCTGGTGATAAAACCTACACATCCGGGAGAGTATGGGGCTTCGACCAA 
ThrPheGlnLeuSerGlyAspLysThrTyrThrSerGlyArgValTrpGlyPheAspGln 
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ACCGGATCCGACATTACGGAACAAGCAGCTATAACGAATATTAACAACAATCAATTCACG 

ThrGlySerAspIleThrGluGlnAlaAlalleThrAsnlleAsnAsnAsnGlnPheThr 
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TATACGCTTCCTCCATTGTCGGCTTACCACATTGTTCTGAAAGCGGATAGCACCGAACCG 
TyrThrLeuProProLeuSerAlaTyrHisIleValLcuLysAlaAspSerThrGluPro 

1630 1640 1650 1660 1670 1680 
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GTCATCTCCGAGATCCCCTCCAGCAGCACCAGCTCTCCGGTCAACCAGCCTACCAGCACC 

VallleSerGluIleProSerSerSerThrSerSerProValAsnGlnProThrSerThr 
Linker 4 3 kdal B region 
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AGCACCACGTCCACCTCCACCACCTCGAGCCCGCCAGTCCAGCCTACGACTCCCAGCGGC 
SerThrThrSerThrSerThrThrSerSerProProValGlnProThrThrProSerGly 

1750 1760 1770 1780 1790 1800 
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TGCACTGCTGAGAGGTGGGCTCAGTGCGGCGGCAATGGCTGGAGCGGCTGCACCACCTGC 

CysThrAlaGluArgTrpAlaGlnCysGlyGlyAsnGlyTrpSerGlyCysThrThrCys 

4 3 kdal A region 

1810 1820 1830 1840 1850 
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GTCGCTGGCAGCACTTGCACGAAGATTAATGACTGGTACCATCAGTGCCTGTAG 
ValAlaGlySerThrCysThrLysIleAsnAspTrpTyrHisGlnCysLeu 
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